GitHub@mujiu555's page

1 Hello, world!

This is my personal page hosted on github pages, powered by Typsite.

Which records some of my works and articles. Derived from my repository Wishful-Thinking manually. Could someone help me to automate this process? Thanks a lot! (For example, using GitHub Actions)

1.1 Self Intro

Here is a brief introduction about myself:

  • College student majoring in Information and Computing Science.
  • pre-OIer (but really 菜).
  • I learnt many different programming languages, with different design principle:

    • C/C++, Rust, C#, Common Lisp, Scheme programming.
    • Python hater, but sometimes use it for most of my research works.
    • Assembly learnt.
    • I swear I am trying my best to learn Haskell/Idris2 for my interests in Programming Language Theory.
  • BTW I use NixOS NixOS . (Yeah, I was an Arch User)
  • Not a femboy! Meow~

    • It is just a natural look, I swear!
    • Normal passing by weeb.
    • Ignore my strange behaviours, please~~~
1.1.1 Github: mujiu555

Follow please! I can do anything for you!

1.1.2 We are proud of being sick

WE ARE NERD, FREAK, GEEK, OTaku, WEIRDO, LONER, SOCIALLY AWKWARD, ASPERGER, AUTISTIC, ADHD, ETC.

But the reason why is that we have our interests, our passion, our dreams.

The world does not understand us, but we understand ourselves.

We are not afraid of being different.

We are proud of being sick, being special. Wondering our future, dreaming our future.

We are not the losers, we are doing what we love.

2 Works
2.1  Contents [contents]
2.1.1 Overview
2.1.1.1  Not actually even a diary [diary]
2.1.1.1.1 2025-12-21 01:35: Nothing Special

Of course, this should be written in the form of mail. Or, in my original plan, as chat history.

To difficult to design a function.

2025-12-21 01:42
Self
With nickname, made up, date time, when it was sent, and content. But after all, just keep it simple. Not bad to calling library.
2025-12-21 02:11
It works now.
2025-12-21 02:12
2.1.1.1.2 2025-12-21 02:13: Silent Message
2025-12-21 02:13
Self

You know, I always alone, There is no one to talk to.

They never concerning what I said, the things I'm caring about.

Assembly code, Computation theory, Mathematics, the thing they never understand, the things they never ever care about.

I cannot even cry, Nobody will give even a glance.

Right, you know.

On my poor stupid.

2025-12-21 02:23
Self

The reason why you feel suffering, is only that you are trying every effort to spare others.

So stupid.

What the fuck you even written?

In fucking English.

You desired caring.

Long for love.

2025-12-21 02:30
Self

No, never, ever, thinking about girls. You mother fucker stupid.

The only thing you can archive is messing up everything.

Weeb, ah,

They, won't ever, even, want you to be their friend.

Understand?

2025-12-21 02:37

甚至唯一会主动给你发消息的还是steam推广和github education通知.

笑嘻了.

2025-12-21 02:39

大学三年连个屁都放不出来,

还lilies, 白日做梦.

天天就光玩你那破汇编去吧. 饭都吃不起的家伙.

谁理你啊

2025-12-21 02:40

人缘还差, 性格跟粑粑似的.

照镜子都不犯恶心吗

还计算机

2025-12-21 02:41

天天摆烂, 屁事情不干光玩有的没的去吧

马上期末了, 等死

2025-12-21 02:42
懒到生蛆, 吃屎都赶不上热乎的
2025-12-21 02:43

每天除了意淫有人喜欢你以外还能干什么.

2.1.1.1.2.1 2025-12-21 18:38: 摆烂日寄

讲真话现在天天看晚上发电的都尴尬, 中二一笔.

Self

有些人他们是如此的幸福, 如此的,,, 快乐, 以至于完全不知道, 没有体会, 什么是痛苦.

多么让人羡慕啊, 多么让人… 甚至, 都称不上嫉妒 只能说祝福.

因为, 这样的事, 这样的幸福, 永远不会属于我们 永远不会

直到我们死去, 留下终生苦痛

我们的命运, 我们的义务, 我们的责任, 不容改变

不要怀揣不应有的情感 它即不属于你, 就永远不要期望

你死那天 鲜花会落下, 但不是为你

任时代滚滚, 碾碎红尘, 而独独不会记住你

这就是我们, 可悲至极

"渣滓, 废物"

因为自私, 因为怯懦, 心无大志, 四体不清

我只想这样平凡的死去

不能再承受, 我们的过去, 我的未来

无病呻吟

唯独安宁的死亡可以给予宽慰

将我们带走, 直到黎明

多么想要, 知道如何活下去啊

2.1.1.1.3 2025-12-22 17:52: We choose to give up

You shall understand, the emotions called love is coming from the chemical reaction in brain. Something not logical, not rational.

The purpose of life, born with, assigned by the creator, to survive, to reproduce. It's not our choice to make.

We choose to give up, we choose to fight for myself, we choose to discard the natural selection.

The idea is, the emotions, derived from the chemical reaction in brain, affected by the hormones, distorted by the environment, drive us crazy, make us nonsensical.

The history won't remember us, but ourselves.

2.1.1.1.4 2026-01-02 23:35:

我们曾向往星辰, 但时间将激情消磨

但是, 我们仍将疯狂, 为了未来, 为了愚蠢

只需追求, 可见, 不可见的, 未来

我们终将到达的, 无法改变的, 未来

悲哀的, 痛苦的, 永远哀恸的未来

我们已知未来不可为, 我们已知未来黯淡

我们追寻, 我们探索, 我们不择手段

只为前进, 只为苦痛

鞭笞, 失望, 耻笑, 永远伴随

孤单, 愤怒, 害怕, 永远显露

未来是锚点, 是知识, 是出口

这是可知不可知的知识, 不可传述, 只有一瞬间的认知

2.1.1.1.5 2026-01-05 01:28: 愿神明赐与我永恒的安眠
2.1.1.1.6 2026-01-24 19:39: 找到男盆友啦!
2.1.1.2  Assembly, Constitution Principle of Computer, Computer Organization and Architecture and Operating System (Section I) [S1]
2.1.1.2.1 Assembly, Constitution Principle of Computer, Computer Organization and Architecture and Operating System
2.1.1.2.1.1 Section I: Basis Assembly
2.1.1.2.1.2 Coding, Numeration, Radix

Values, plain bits, expressed in high or low electronic levels, may represent some information. With corresponding context or encoding, together with its own properties, like name, can be then interpreted as real information, the data. Raw information, data, must have some way to be stored. And the way to translate original data into values can be stored in computer, it then called, "coding". Encoding converting data into a specific format or representation.

Coding help people understand data.

2.1.1.2.1.2.1 Symbol, Calculation & Presentation

Calculation are some relation between different data. Directly, manipulate different value in different coding.

2.1.1.2.1.2.2 Decimal

Decimal integers are numbers based on ten, which means every number represented in decimal form may contains only 0-9. Every digit's value based on position dependent power of 10.

2.1.1.2.1.2.3 Binary

Binary integers are numbers based on two, every time a digit has value of 2, will result in carry. Digits in binary representation called "bits". Thus only 0, 1 will appear in binary representation.

Every bit's value based on position dependent power of 2.

2.1.1.2.1.2.4 Hexadecimal, Octal

Hexadecimal numbers based on 16 while Octal numbers based on 8.

2.1.1.2.1.2.5 Radix conversion

Referencing redix.

2.1.1.2.1.2.6 Data, Numbers, Computer

Data is presented in binary number in computer.

For each cell of calculation unit can only have two state, open and close. Which has natural one-to-one correspondence with binary bits.

2.1.1.2.1.3 CPU, BUS, Memory

Most important part of a computer is CPU. CPU, central processing unit, controls almost all calculation process of computer.

And, further more, ALU, arithmetic logic unit, is kernel of CPU. The ALU is responsible for arithmetic and logical computations. Without an ALU, the CPU would be unable to perform its core operations.

Registers are another kernel of CPU, which provides ability for CPU to store data.

CU, front-end of a CPU, controls the behaviour of whole CPU. CU may fetch commands, do preprocessing and instruct command execution order. Preprocessing for commands can be PreDecode PreDecode , Decode Decode , Micro-Fusion Micro-Fusion / Macro-Fusion Macro-Fusion , Branch Prediction Branch Prediction and Static Prediction Static Prediction .

To boost execution for float point number calculation, some CPU may also have FPU, floating point unit.

Memory access is another function a CPU must have, so, AGU, address generation unit, or ACU, address calculation unit, will help CPU calculating address offset of main memory.

MMU, memory management unit, a control unit maybe outside CPU, controls memory, maps logical memory from to physical address.

TLB, translation lookaside buffer, a critical cache for memory management, every time CPU try to map and fetch data from memory, it may visit TLB, so that memory address translation may speed up by checking existing mapping entry.

Cache, a general purpose buffer for data fetch from memory, once data caches, it can be access much faster than other data still exist only in main memory later. When data accessed, changed and used, it may also be written back to memory when every thing finished.

2.1.1.2.1.3.1 Data, Instructions

Data, the raw information, may has some specified meaning after interpreting by associating it together with context and name. Instruction, represented in same way as regular data, in binary number.

Data is what a computer processes, and instructions specify how to do so.

2.1.1.2.1.3.1.1 Dimension, Unit

To measure how much data there are, it is needed to specify units.

Unit Conversion From
bit bit / None
Byte Byte 8 bit bit
KiB KiB 1024 Byte Byte
MiB MiB 1024 KiB KiB
GiB GiB 1024 MiB MiB
TiB TiB 1024 GiB GiB
EiB EiB 1024 TiB TiB
kB kB 1000 Byte Byte
mB mB 1000 kB kB
gB gB 1000 mB mB

Most common used unit in computer is Byte, it is also the smallest data unit a computer can handle (for most computer).

As for information theory, smallest unit is bit, which is also the smallest unit to weigh memory. For most memory (SRAM, DRAM), the smallest storage unit is also bit. In most architecture, memory is visited in bytes, but there still some special processor can address using bit. Some even special ones may address by word, or double word.

Processors may treat data different as well. As for processing granularity, a byte is typically the smallest independently loadable/storable object, whereas the minimum operand width for arithmetic/logic operations depends on the ISA (commonly 8/16/32/64 bits).

2.1.1.2.1.3.2 Harvard, von Neumann Architecture

As we mentioned before, data and instructions both stored in binary form. So, CPU cannot actually tell whether some memory storing data or instructions.

Thus there are two method to store them.

One is "von Neumann Architecture", data and instructions share same memory space. In this way, it depends on context to distinct which one is data and which one is instruction.

Another way called "Harvard Architecture", for which data and instructions are stored in two different memory.

von Neumann architecture provides programmers with flexibility to treat data as instructions, so that some self-modifying code can be possible. For example, some JIT compiler are implemented in such way.

Harvard architecture, however, prevent data from being treated as instruction. Though it reduces flexibility, ambiguity are prevented.

2.1.1.2.1.3.3 Program Counter, Instruction Register

How a program executes? CPU reads instruction, and them executes them. Both Harvard and von Neumann architecture will follow this process.

But how CPU read instructions then? Let's concerning von Neumann architecture first: Data and instructions are mixed up in memory for a von Neumann processor. Thus, there must have something can record which one is instruction, so that processor may not read wrong memory. Each time processor want to execute next instruction, it will refer to the thing. And after processor executed one instruction, it may move to next instruction, so that processor can execute whole program in specified sequence, rather than just execute one instruction repeatedly. What will happened when we switch to Harvard architecture processor? Still, though where instructions are placed is fixed for computer during a program's execution. The processor must know, how many instructions it has executed and where next instruction is.

Thus, in practice, there must exist a abstract register called "Program Counter Register" tracks instruction execution.

But, where shall CPU read instructions to? To parse and knowing detailed execution information, CPU first read instructions according to PC, and then put what it reads to IR, "Instruction Register".

Those instructions then parsed and analyzed, and take effects.

PC and IR are both abstracted concept of physical registers. They may not exists in real CPU, but there must exist a, or group of, register(s) do the function they describes.

2.1.1.2.1.3.4 Memory Address Register, Memory Buffer Register, Memory Data Register & Memory

When CPU try to visit memory, it also needs something to record where it meant to read. Just like PC records which instruction should execute next. MAR, "Memory Address Register" records which memory should be read next. And just like IR records instruction read, MBR caches data read from memory.

In some case, MBR can also be called as "Memory, Data Register", MDR.

Furthermore, most important, MAR, MBR still not the real register.

2.1.1.2.1.3.5 Fetch-Execute Cycle

When CPU executing programs, it follows the fetch-execute cycle. Until it receives halt instruction, it will repeat read, decode, execute process.

Instructions are stored in memory, and CPU must read them so that it can be decode then. CU, controls the whole process of reading and decode.

CPU first determine logical memory address according to PC, and then send the memory request to MAR. MAR store the command and communicate with main memory. Main memory pass requested data, or instructions to CPU by bus, and then store those data in MBR. IR then fetch instruction from MBR, split full instruction into Operator part and address part. Calling ALU to actually execute the instruction.

This is a full fetch-execute cycle for CPU.

2.1.1.2.1.3.6 CISC & RISC

CISC, Complex Instruction Set Computer, a collection of architecture, try to improve computer performance by decrease instruction number of some specify operations. In general, CISC computer may have more special purpose instruction, so that it can perform different complex operation within one execution cycle. Instructions used by CISC, sometimes are multiple-bytes, and may vary with its purpose. Total CPU cycle consumed by a instruction may also vary. But they always provides various method for memory accessing.

While RISC, Reduced Instruction Set Computer, try to reduce type of instructions. Since most instructions in CISC may not used frequently, and some of those instruction can be seen as combination of other simpler high-frequent instructions, improve the performance of basic instructions may have higher performance overall, and this can make ISA design simpler as well. Instructions are all fixed byte and most of them consume only 1 CPU cycle strictly in RISC. CPU pipeline can even shrink some instructions' execution less than 1 CPU cycle. Memory addressing method are limited and most operations are finished in register.

Most register in CISC may have its own function but those in RISC are mostly general purposed. Furthermore, overall number of registers in RISC are more than those in CISC.

CPU control method adopted by those two type of architecture are also different, CISC often uses micro program to control whole CPU, while RISC uses logical circuit.

2.1.1.2.1.3.7 Cache

Inside CPU, it is too slow to fetch outside registers, so cache some frequent used data is a good idea. Cache may have multiple level, each get far away from core.

L1, L2 cache may spare within one core, and L3 cache may be used commonly by whole CPU.

2.1.1.2.1.3.8 Memory

Memory, most data and instructions are stored here, CPU use it to cache data, store results and communicate with other components.

Primary memory, often RAM, random access memory, have different kind of distribution. Mainly there are two different RAM,

  • Static RAM, SRAM, RAM that designed using flip-flop to store bits. "Static" means that SRAM need not extra operations to keep data. And have relative faster access speed among all kind of memory.

    • Sync SRAM
    • Async SRAM
    • Burst SRAM
  • Dynamic RAM, DRAM, RAM that designed using capacitor. "Dynamic", in contrast, needs refresh regularly, for capacitor lacks electron as time. DRAM always have smaller size, lower electronic level, but slower speed.

    • DDR
    • LPDDR

On the other part, memory can also distinct by memory Error Check and Correct ability,

  • Regular memory
  • ECC memory

Recently (but not that recently), there are a new kind of memory, Optane memory, it can even store data after power-off.

Devices other than main memory still have their own memory, for example, hard drivers, may have their own cache (a memory) to exchange information with CPU.

2.1.1.2.1.3.8.1 Address

Memory is a kind of physical device, but it is not possible to access memory through its physical information, otherwise, every program vendor must provide different program instance for every combination of memory, CPU, and other hardware. Concerning size of memory, design of memory, even id of memory.

So, mapping physical memory unit into logical memory is essential. In computer, we assume memory are continuous, no matter how many memory card you installed, and no matter what size each memory card have. And then, we split this continuous space into pieces with same logical size. Assign each logical piece with an id, for referencing. Those ID for memory space, just like id for bank safe, by accessing corresponding bank safe, we can store or withdraw things in it.

Even, you may store a id represent another bank safe inside. And we can than find another bank safe by the one you holds.

Other memories (or some special device can abstract as memory) will also be mapped and concatenated into the logical memory. And then CPU can access those devices without specify its hardware information.

This id, we call it "Address". Every address indexes a space of memory.

2.1.1.2.1.3.8.2 Bytes, Word, Double Word and Half-Word

In assembly, or CPU design, there are another measurement for data,

Name Conversion From
bit bit / None
Byte Byte 8 bit bit
Half Half 4 bit bit
Word Word 2 Byte Byte
Double Word Double Word 2 Word Word
Quad Word Quad Word 4 Word Word
Paragraph Paragraph 8 Word Word

Those units measure the data computer can manipulate DIRECTLY.

2.1.1.2.1.3.8.3 Direct Memory Access

Most time, CPU do calculating work, this takes relative small times. But when CPU have to access memory or other device, it must take multiple cycles to fetch data. Transfer data from and between memory.

Thus, it is natural to have a special designed device fetching data for CPU. When CPU have to fetch data from peripheral, DMA will take this job and copy information from those devices into memory, while CPU do its own calculating job.

2.1.1.2.1.3.9 ROM

Outside memory, there are another kind of data storage, ROM, Read-Only Memory.

This kind of flash, can store data without electronic refresh. So, even power-off may not delete required data, thus, it always used for BIOS storage.

As time goes, ROM soon developed into EPROM, EEPROM and NAND Flash. Which can be read and rewritten using special tool, can be covered using light or other method, and Write-Rewrite using only electron. NADA Flash is the basis of USB Memory Driver and SSD.

2.1.1.2.1.3.10 Storage

Hard drivers, together old school soft drivers, are storage for computer, which have larger space, more reliable storage ability than memory. Always have the responsibility for keep data.

But the speed of storage are much slower than memory.

2.1.1.2.1.3.11 BUS

How CPU access its desired data, how CPU touches its required devices indeed?

In modern computer system, CPU communicate with other devices through BUS.

Why we need BUS, rather than other communicate architecture?

  • BUS can decrease complexity: In other system, like directly communicate, if we have N devices to communicate, then there must have at least 𝐶𝑁2 circuit. But with BUS, N-N network topology can be then reduced to N-1-N topology or N-1-Adapter-1-N bus-star topology.
  • BUS also standardize interfaces for devices. Before PCIe, there are multiple different connector for devices.
2.1.1.2.1.3.11.1 Address BUS

Address Bus, as its name, used for transfer memory address. With address bus, CPU then can visit its wanted memory.

Address Bus transfer address information, and only pass from controller to terminal device. Width of address bus determine the largest memory space a computer can visit.

With a 32-bit address bus, CPU can visit maximum 4GB data.

2.1.1.2.1.3.11.2 Data BUS

Data Bus transfer actual data, as CPU specify its wanted data space address by Address Bus. The terminal device may return actual data the space stores back towards CPU using Data Bus. Also, CPU may write its result to memory by Data Bus.

Data Bus transfer data, Data Bus can transfer data towards both side. No matter data from CPU and write to terminal device, or come from terminal and fetched by CPU. Width of Data Bus limits maximum size of data a CPU can fetch or write.

With a CPU with register size 64, Data Bus width 64, whole register can be stored directly.

2.1.1.2.1.3.11.3 Control BUS

Control Bus transfer control or status signal. Both side can send or receive signal transferred by Control Bus. Width of Control Bus can affect operations of CPU.

Signals send by Control Bus controls the behaviour of devices, for example, write or read signal send to storage will instruction storage which data to read or how to store some data. Also, signals send by terminal devices may also affect CPU, for example, I/O finish interrupt signal may tell CPU some data finish reading.

2.1.1.2.1.3.11.4 Dual Independent BUS: North, South Bridge

In traditional bus system, bus connects all components of a computer. This result in long time waste when I/O transfer.

Then it is possible to spare high-speed devices and low-speed devices into two bus.

Back Side Bus, inside CPU, connect each kernel of CPU, ALU, CU and so on. Front Side Bus, outside CPU, connect CPU with North and South Bridge.

  • North Bridge, connects CPU, North Bridge and other high speed devices. Main Memory and high speed caches
  • South Bridge, connects to North Bridge and other low speed devices.

    • PCI: high speed I/O devices
    • ISA: low speed I/O devices
2.1.1.2.1.3.12 Stack

Since memory is represented in large continuous space logically. Find methods for data management is a large problem.

A simple way to manage data is stack.

Stack is a linear first-in-last-out data structure. First choose an address as base of stack, and then we can push data and pop data out of the stack. On the other way, it is possible to index element inside a stack by offset.

2.1.1.2.1.3.12.1 Stack grows downwards

In computer, continuous memory have address, and then some address with larger value can be seen as high address, and thus we can define the side of stack.

In general, we always choose higher address as the base of stack, and then stack increment will result in stack grown towards lower address.

Why stack always choose higher address: https://github.com/mujiu555/Wishful-Thinking/blob/mujiu555@feat/c/doc/root/c/typ/S1.typ.

2.1.1.2.1.3.12.2 Push

Push operations to stack eventually lead to stack growth. It first add new element onto the top of stack, and then increase stack top pointer.

2.1.1.2.1.3.12.3 Pop

Pop operation to stack eventually lead to stack shrink. It store the value store at top to somewhere, and then decrease stack top pointer.

2.1.1.2.1.3.13 Registers

Registers in CPU, is the most basic function unit. They have the function to store data, and put them into calculating.

Following are registers commonly used in 8086 8086 , i386 i386 , x86 x86 , ia32 ia32 , amd64 amd64 ( x86_64 x86_64 ).

2.1.1.2.1.3.13.1 AX(Accumulator), BX(Base Address), CX(Counter), DX(Data)

In x86_64, there are four general purpose registers. They are *AX *AX , *BX *BX , *CX *CX , *DX *DX .

Those general purpose registers can be divide, and used as smaller registers.

Name Representation x64 x86 x16 8
Accumulator Accumulator *AX RAX EAX AX AH, AL
Base Address Base Address *BX RBX EBX BX BH, BL
Counter Counter *CX RCX ECX CX CH, CL
Data Data *DX RDX EDX DX DH, DL
  • *AX register always join calculation, and can store results in mut mut , div div operation, or function call returning value.
  • *BX register always join rebase operation, used as memory access offset.
  • *CX register always treat as counter, and will automatically decrease in loop.
  • *DX register always transfer arguments, do I/O operation.
2.1.1.2.1.3.13.2 CS:IP(Code Segment: Instruction Pointer)
2.1.1.2.1.3.13.3 SS:BP, SS:SP (Stack Segment: Base Pointer, Stack Segment: Stack Pointer)
2.1.1.2.1.3.13.4 SI, DI (Source Index, Destination Index)
2.1.1.2.1.3.13.5 DS (Data Segment)
2.1.1.2.1.3.13.6 ES (Extra Segment)
2.1.1.2.1.3.13.7 FLAGs
2.1.1.2.1.3.13.8 R8, R9, R10, …, R15
2.1.1.2.1.3.14 Heap
2.1.1.2.1.4 Syntax
2.1.1.2.1.4.1 Operator, Operand
2.1.1.2.1.4.2 Comment
2.1.1.2.1.4.3 Memory Access
2.1.1.2.1.4.4 Labels
2.1.1.2.1.4.5 Macro
2.1.1.3  MIT 18.404j Theory of Computation (junior) [S1]
2.1.1.3.1 Applications
2.1.1.3.2 Modules of computation

Capture important aspect of thing we try to understand.

2.1.1.3.2.1 Finite Automata

Use less memory with limited ability of computation.

Each have different

  • Stats: 𝑞1,𝑞2,𝑞3
  • Transitions: 1
  • Start State:
  • Accepted state:

Give finite string as input, and have output of accepted or reject.

Begin at start state, read input symbols, follow corresponding transitions, Accept if end with accept state, Reject if not.

We say that "M_1 accepts exactly those string in A where 𝐴={𝑤|𝑤 contains substing 11}". And, we have A that is the language accepted by the language 𝐿(𝑀1). 𝑀1 recognize A and 𝐴=𝐿(𝑀1).

2.1.1.3.2.1.1 Define a finite automation

Defn: A finite automaton M is a 5-tuple (𝑄,Σ,𝛿,𝑞0,𝐹):

  • Q: finite set of states
  • Σ: finite set of alphabet symbols
  • 𝛿: transition function 𝛿:𝑄×Σ𝑄 𝛿, somehow is, a kind of relation, give a state and a accepted symbol, then returns a (maybe) new state. Eg. 𝛿(𝑞,𝑎)=𝑟
  • 𝑞0: start state
  • 𝐹: set of accept states

For example above:

  • 𝑀1=(𝑄,Σ,𝛿,𝑞1,𝐹),
  • 𝑄={𝑞1,𝑞2,𝑞3},
  • Σ={0,1},
  • 𝐹={𝑞3}.

And have:

2.1.1.3.2.1.2 String and languages
  • A string (word) is a finite sequence of symbols in Σ (alphabet),
  • A language is a set of strings (finite or infinite),
  • A empty string 𝜀 is a string of length 0
  • The empty language is the set with no strings.

Defn: M accepts string 𝑤=𝑤1𝑤2𝑤𝑛 each 𝑤𝑖Σ if there is a sequence of states 𝑟1,𝑟2,𝑟𝑛𝑄 where:

  • 𝑟0=𝑞0, state sequence starts at initial state,
  • 𝑟𝑖=𝛿(𝑟𝑖1,𝑤𝑖) for 𝑖𝑖𝑛, each state transition from previous one defined by transition functions,
  • 𝑟𝑛𝐹, whole sequence must be accepted.

Recognizing languages:

  • 𝐿(𝑀)={𝑤|𝑀 accepts 𝑤},
  • 𝐿(𝑀) is the language of 𝑀
  • M recognizes L(M)

Every machine can accept many words, but only one language.

Define: a language is regular if some finite automaton recognizes it.

2.1.1.3.2.1.3 Regular Languages

𝐿(𝑀1)={𝑤|𝑤 contains substing 11}=𝐴

2.1.1.3.3 Regular Expressions
2.1.1.3.3.1 Regular Operations

Let A, B be languages:

  • Union: 𝐴𝐵={𝑤|𝑤𝐴𝑤𝐵},
  • Concatenation: 𝐴𝐵={𝑥𝑦|𝑥𝐴𝑦𝐵}=𝐴𝐵,
  • Kleene Star: Unary operation: 𝐴={𝑥1𝑥𝑘| each 𝑥𝑖𝐴 for 𝑘0},𝜀𝐴

Note., empty language won't accept empty string, but Kleene star of empty language will.

2.1.1.3.3.2 Regular expression

Like mathematical expression comes from combination of mathematical operations and mathematical elements, regular expression comes form combination of regular operations and languages.

  • Built form Σ (Alphabet), members Σ, (Empty language), 𝜀 (empty word), [atomic]
  • Using , , , [Composite]

E.g., (01)=Σ gives all strings over Σ.

Finite automata equivalent to regular expressions.

2.1.1.3.4 Closure Properties for regular languages

If some set are closed under some operation, which means after applying those operations on objects, the result will still leave in the same class of objects.

2.1.1.3.4.1 Union:

If 𝐴1,𝐴2 are regular languages, so is 𝐴1𝐴2 (closure under )

Proof: let 𝑀1=(𝑄1,Σ,𝛿1,𝑞1,𝐹1) recognize 𝐴1,
and 𝑀2=(𝑄2,Σ,𝛿2,𝑞2,𝐹2) recognize 𝐴2.
Assuming 𝑀=(𝑄,Σ,𝛿,𝑄0,𝐹) recognize (𝐴1𝐴2),
𝑀 should accept input 𝑤 if either 𝑀1 or 𝑀2 accept 𝑤.

Compose 𝑀1 and 𝑀2 together, then components of M: 𝑄=𝑄1×𝑄2={(𝑞1,𝑞2)|𝑞1𝑄1 and 𝑞2𝑄2}, 𝑞0=(𝑞1,𝑞2) And, 𝛿((𝑞,𝑟),𝑎)=(𝛿1(𝑞,𝑎),𝛿2(𝑟,𝑎)) 𝐹=(𝐹1×𝑄2)(𝑄1×𝐹2)

Note., if 𝐹=𝐹1×𝐹2, then it could be closure under intersection.

2.1.1.3.4.2 Concatenation:

If 𝐴1,𝐴2 are regular languages, so is 𝐴1𝐴2 (closure under )

Assuming 𝑀 accept input 𝑤, if 𝑤=𝑥𝑦 where, 𝑀1 accepts 𝑥 and 𝑀2 accepts 𝑦 But failed.

Proof: Let 𝑀1=(𝑄1,Σ,𝛿1,𝑞1,𝐹1) recognize 𝐴1, and 𝑀2=(𝑄2,Σ,𝛿2,𝑞2,𝐹2) recognize 𝐴2. Construct 𝑀=(𝑄,Σ,𝛿,𝑞0,𝐹) recognize (𝐴1𝐴2).

Then the machine 𝑀 should accept input 𝑤 if there is a split of w into 𝑥𝑦 where 𝑀1 accepts 𝑥 and 𝑀2 accepts 𝑦.

And then construct M:

If there are input word 𝑤, then there should be a split point where 𝑀1 reach accept state and jump to 𝑀2 via 𝜀 transition.

Construct a new machine, concatenating 𝑀1 and 𝑀2 together with 𝜀 transitions from each accept state of 𝑀1 to the start state of 𝑀2.

But the first place machine reach accept state may not be the correct split point. M need to have a idea of all possible split points.

2.1.1.3.5 Non-determinism

It is mostly same as deterministic finite automaton, In deterministic finite automaton, there is exactly one transition for each state and input symbol pair.

The non-deterministic finite automaton may have different transitions for same state and input symbol pair, and this is so called non-determinism.

You may have one transition to go to one state, or another transition to go to another state.

It is also able to have epsilon transitions, which means it can go to another state without consuming any input symbol.

For non-deterministic finite automaton, it can accept inputs if some paths leads to accept states. If there is one finite machine, accept always prior to reject. The only possible reject state is when all possible paths lead to non-accept states.

The possible status of a non-deterministic finite automaton can form a tree structure. Since at each state, there may be multiple possible transitions for same input symbol.

E.g., for input "ab" for given automaton above, possible status can be:

Any way that leads to accept state is accepted.

For "aa", it will never reaches accept state.

2.1.1.3.5.1 NFA

Defn: A nondeterministic finite automaton, 5-tuple (𝑄,Σ,𝛿,𝑞0,𝐹):

  • Q: finite set of states
  • Σ: finite set of alphabet symbols
  • 𝛿: transition function 𝛿:𝑄×Σ𝜀(Σ{𝜀})𝑃(𝑄)={𝑅|𝑅𝑄} 𝛿, a kind of relation, give a state and a accepted symbol (or epsilon), then returns a set of (maybe) new states. Eg. 𝛿(𝑞,𝑎)={𝑟,𝑠}
  • 𝑞0: start state
  • 𝐹: set of accept states

E.g., in above example:

  • 𝛿(𝑞1,𝑎)={𝑞1,𝑞2}
  • 𝛿(𝑞1,𝑏)=

Computation processes of NFA is a kind of BFS:

  • Every time the machine read an input symbol, it will branch out to all possible next states.
  • Every time the machine find a accept state, it will accept the input immediately. Which discard all other possible paths.

Or, you may image the machine can make good guesses at each step, which always choose the correct transition to reach accept state if there is one.

2.1.1.3.6 NFA and DFA equivalence

NFA and DFA are equivalent in power, which means any language recognized by NFA can also be recognized by DFA, and vice versa.

2.1.1.3.6.1 NFA to DFA

Theorem: If an NFA recognizes a language L, then L is regular.

Proof: Let NFA 𝑀=<𝑄,Σ,𝛿,𝑞0,𝐹> recognize L.
Construct DFA 𝑀=<𝑄,Σ,𝛿,𝑞0,𝐹>

Basically, DFA 𝑀 keeps track of the subset of states in NFA 𝑀. Simulate the processes of NFA, every time the symbol is read, DFA 𝑀 update its state to the set of possible states that NFA 𝑀 may reach.

The way to archieve this is to set a state for every possible subset of states in NFA 𝑀. For each state in DFA 𝑀, which is a possible subset of states in NFA 𝑀, remember which subset of states NFA in.

Construction of DFA 𝑀:

  • 𝑄=𝑃(𝑄)={𝑅|𝑅𝑄}, the set of all possible subsets of states in NFA 𝑀.
  • 𝛿(𝑅,𝑎)={𝑞|𝑞𝛿(𝑟,𝑎) for some 𝑟𝑅},𝑅𝑄
  • 𝑞0={𝑞0}
  • 𝐹={𝑅𝑄|𝑅𝐹}

Then, DFA 𝑀 simulates NFA 𝑀 by keeping track of all possible states that NFA 𝑀 may reach after reading input string.

From the construction, Start at the state {𝑞0} in NFA 𝑀, which corresponds to the start state 𝑞0 in DFA 𝑀, try to attach all possible states that NFA 𝑀 may reach after reading input string, Thus a subset of states in NFA 𝑀 can be formed, which is a state in DFA 𝑀. Then start at each state in NFA 𝑀, follow the same rule, try all possible transitions for each input symbol, construct new subset of states in NFA 𝑀, which is a new state in DFA 𝑀. Then start at each new constructed subsets, search all possible transitions for each input symbol with each state in the subset.

  • If any one of the state can reach an new state, then add that new state into the new subset.
  • If any one of the state in the subset is an accept state in NFA 𝑀, then the new subset is also an accept state in DFA 𝑀.

Recursely do this until no new subsets can be formed.

P.S., with this construction, some states in DFA 𝑀 may be unreachable from the start state {𝑞0}, discard those states. With this construction, some states in DFA may not able to reach any accept states, those states can be considered as dead states. Discard those or keep those states as you like.

P.S., If any one of the state have epsilon transitions, then add those reachable states via epsilon transitions into the subset as well.

E.g., for NFA above:

Since no branch sketch to dead state {𝑞2} or {𝑞3} or {𝑞4}, those states can be discarded.

Which have a image like:

2.1.1.3.6.2 Recall for Closure Properties
  • Union: Construct a new NFA that connect two start states of two NFA via epsilon transitions from a new start state. And then everything done.
  • Concatenation: Construct a new NFA that connect each accept state of first NFA to the start state of second NFA via epsilon transitions. And then everything done.
  • Star: Construct a new NFA that connect each accept state of NFA back to the start state via epsilon transitions. Also, add a new start state that is also an accept state, and connect it to the old start state via epsilon transition. And then everything done.
2.1.1.3.6.3 Regular Expression to NFA

Theorem: If R is a regexpr and 𝐴=𝐿(𝑅) then A is regular.

Proof:

Basically, Convert R to equivalent NFA 𝑀,

  • If R is atomic:

    • 𝑅=𝑎 for a symbol 𝑎Σ:
    • 𝑅=𝜀: or
    • 𝑅=:
  • If R is composite:

    • 𝑅=𝑅1𝑅2: for and , exists
    • 𝑅=𝑅1𝑅2: for and , exists
    • 𝑅=𝑅1: for , exists

Then, by structural induction on R, we can show that NFA 𝑀 recognizes A.

2.1.1.3.6.4 Generalize NFA

Similar to NFA, but will more complex transitions. GNFA allow transitions labeled with regular expressions.

Assume:

  • one accept state, separate from the start state: connect all old accept states to new accept state via epsilon transitions, and treat old accept states as normal states.
  • one arrow from each state to each state, except:

    • only existing the start state
    • only entering the accept state
    • connect states without stransitions via emptyset transitions.
2.1.1.3.6.5 NFA to regular

Inverse, if a language L is regular, then there is a regexpr R such that 𝐿=𝐿(𝑅).

Lemma: Every GNFA G has an equivalent regular expression R.

Proof:

By induction on the number of states in GNFA G.

Basic(k = 2): G = . Let R = r

Induction step(k > 2): Assume Lemma true for k - 1 states and prove for k states.

Convert k-state GNFA G to (k - 1)-state GNFA G' by removing one state q_rip that neither start nor accept states. And repair all path may go through q_rip.

2.1.1.3.7 Non-regular languages
2.1.1.3.7.1 Pumping Lemma for regular languages

To show a language is regular, just give a finite automaton or a regular expression.

To show a language is non-regular, give a proof by contradiction with pumping lemma.

Pumping lemma for regular languages describes a property that all regular languages must satisfy. If a language fail to satisfy this property, then it is non-regular.

Pumping Lemma: For every regular language A, there is a number p (the pumping length) such that if 𝑠𝐴|𝑠|𝑝 then 𝑠=𝑥𝑦𝑧 where

  • 𝑥𝑦𝑖𝑧𝐴 for all 𝑖0,
  • 𝑦𝜀 (y is not empty),
  • |𝑥𝑦|𝑝,

Informally, any sufficiently long string in a regular language can be pumped (have a middle section repeated any number of times) and still be in the language.

Or, If there is a substring that can be repeated any number of times to produce new strings in the language, then the language may be regular.

Pumping lemma depends on the fact that if M has p states, and it runs for more than p steps will enter some state at least twice (by pigeonhole principle).

2.1.1.3.7.2 Using pumping lemma to show non-regularity
2.1.1.3.7.2.1 𝐷={0𝑛1𝑛|𝑛0}

Let 𝐷={0𝑛1𝑛|𝑛0} show: D is not regular.

Proof by contradiction: Assume D is regular. Then, by pumping lemma, there is a pumping length p. Let 𝑠=0𝑝1𝑝𝐷 thus |𝑠|=2𝑝𝑝.

And pumping lemma says that 𝑠=𝑥𝑦𝑧 where

  • 𝑥𝑦𝑖𝑧𝐷 for all 𝑖0,
  • 𝑦𝜀,
  • |𝑥𝑦|𝑝,

Assuming 𝑥,𝑦 contains all 0s, then 𝑦=0𝑘 for some 𝑘1. But 𝑥𝑦𝑦𝑧 has excess 0s than 1s, thus 𝑥𝑦𝑦𝑧𝐷, contradiction.

Therefore the assumption is false, D is not regular.

2.1.1.3.7.2.2 𝐹={𝑤𝑤|𝑤Σ}, Sigma = {0, 1}.

Let 𝐹={𝑤𝑤|𝑤Σ}, Sigma = {0, 1}. Show F is not regular.

Proof by contradiction: Assume F is regular. Then, by pumping lemma, there is a pumping length p. Let 𝑠=0𝑝10𝑝1𝐹

According to pumping lemma, 𝑠=𝑥𝑦𝑧 where

  • 𝑥𝑦 holds all 0s in the first half of s,

And 𝑥𝑦𝑦𝑧 has excess 0s in the first half than the second half,

Contradiction found, thus F is not regular.

2.1.1.3.7.2.3 𝐵={𝑤|𝑤 has equal number of 0𝑠 and 1𝑠}

Let 𝐵={𝑤|𝑤 has equal number of 0𝑠 and 1𝑠}. Show B is not regular.

Proof by contradiction: Assume B is regular. Then, by pumping lemma, there is a pumping length p.

Since we know that 01 is regular, thus 𝐶=𝐵01={0𝑛1𝑛|𝑛0} is also regular (by closure under intersection).

But for language C, we have already shown it is not regular.

Contradiction found, thus B is not regular.

2.1.1.3.8 Context-free languages

Context free grammar are more powerful than finite machines.

Composed of variables and rules.

  • rule: variable -> string of variables and terminals
  • variable: non-terminal symbol, appear on left side of some rule
  • terminal: symbol in the alphabet or epsilon, appear in only right side of rules
  • start variable: special variable that appear in the left side of no rule

Grammar can generate strings by starting with start variable, then repeatedly replacing some variable with the right side of one of its rules, until there is no variable left.

The terminals are the base of final strings generated by the grammar.

2.1.1.3.8.1 Parse Trees

Start at the root with start variable, then for each rule applied, create child nodes for each symbol in the right side of the rule.

When all leaves are terminals, the parse tree is complete.

2.1.1.3.8.2 Formal definition of CFG

Defn: A context-free grammar G is a 4-tuple (𝑉,Σ,𝑅,𝑆) where

  • V: finite set of variables
  • Σ: finite set of terminal symbols, disjoint from V
  • R: finite set of rules of the form 𝐴𝛾 where 𝐴𝑉 and 𝛾(𝑉Σ)
  • S: start variable, 𝑆𝑉

For 𝑢,𝑣(𝑉Σ), we say that u directly derives v,

  1. 𝑢𝑣: u yield v if it can go from u to v in one substitution step in G
  2. 𝑢𝑣: u yield v if it can go from u to v in zero or more substitution steps in G or 𝑢𝑢1𝑢2𝑣, called derivation of v from u. If 𝑢=𝑆, then it is a derivation of v from G.

𝐿(𝐺)={𝑤Σ|𝑆𝑤}, the language generated by G.

Defn: A is a context-free language if there is a CFG G such that 𝐴=𝐿(𝐺).

2.1.1.3.8.3 Ambiguity

For some CFG, there may be more than one parse tree for some string in the language. For some string, there may be more than one leftmost derivation or more than one rightmost derivation.

2.1.1.3.8.4 PDA: pushdown automata

This is a new view of finite automata with a stack memory.

For a pda, there exists a finite controller and a input tape, the head pointer can always trace input.

PDA are mostly similar to finite automata, but with a stack.

The limitation of finite automata is limited memory, but with a stack, PDA has unlimited memory, used in a restricted way. And PDA have the ability to push data into the stack, pop out of the stack and used as memory.

Only accepted at the end of input.

Defn: A Pushdown Automata is a 6-tuple: <𝑄,Σ,Γ,𝛿,𝑞0,𝐹>,

  • Σ: inpu alphabet
  • Γ: stack alphabet
  • 𝛿: transition functions: 𝑄×Σ𝜀×Γ𝜀𝑃(𝑄×Γ𝜀) 𝛿(𝑞,𝑎,𝑐)={(𝑟1,𝑑),(𝑟2,𝑒)} epsilon here represents read no symbol in input, or read nothing in stack.

E.g., 𝐵={𝑤𝑤𝑅|𝑤{1,0}}, and sample input: 011110

  • read and input input symbols, nodeerministically either repeat or goto 2
  • read input symbols and pop stack symbols, compare, if ever not equals to then thread reject.
  • and enter accepted state if stack is empty.

Assume, every time the state fork, stack is duplicated for each.

2.1.1.3.8.5 Convert CFG to PDA

Theorem: If A is a CFL then some PDA recognizes A. Proof: Convert A's CFG to a PDA.

IDEA: PDA begins with starting variables and guesses substitutions. It keeps intermediate generated string on stack. When done, compare with the input.

P.S., Use stack as a kind of cache for intermediate generated string.

If find a terminal on the top of stack, then pop it and compare with input symbol, until there is a variable in the stack.

  1. Push the start symbol on the stack.
  2. If the top of stack is a variable, non-deterministically choose a rule with that variable on the left side, pop the variable and push the right side of the rule onto the stack. Else if the top of stack is a terminal symbol, then pop it and compare with input symbol, if equal, then read next input symbol.
  3. If both input and stack are empty, then accept.
2.1.1.3.8.6 Convert PDA to CFG

Theorem: A is a CFL iff some PDA recognizes A. Proof need to be done on both PDA can be converted to CFG and CFG can be converted to PDA.

Proof:

2.1.1.4  The Missing Semester of Computer Education [S1]
2.1.1.4.1 Section I

除了算法, 工具可以有效提升工作效率, 这是一个尝试, 教授如何掌握工具, 以及提供(可能)不清楚但是有用的工具.

这会跟进很多领域(11)

仅会介绍少量极其有用的工具

2.1.1.4.2 Shell

Shell是与计算机交互的一个重要途径

可以组合文本操作,

  1. 可以直接在shell中输入指令
  2. 可以通过参数临时修改程序执行的行为(由程序自身决定)
  3. 参数通过空格隔开

Shell可以通过PATH路径找到可以使用的指令 PATH是用来在计算机中找到可执行文件的方式

  1. 绝对路径: 一个文件的全部路径
  2. 相对路径: 文件相对于当前工作目录的路径 (pwd)
  3. . . : 当前目录
  4. .. .. : 上一级目录
  5. ~ ~ : 家目录
  6. - - : 上一次 cd cd 的目录

命令的参数:

  1. 一般通过 --help --help 查询
  2. - - flag, 一般为短开关, 可以自由组合
  3. 方括号一般表示内部是可选的

权限:

  1. 第一位表示目录/普通文件/套接字
  2. ugo: 所有者, 所有者组, 其他:
  3. 一共由9位二进制表示: 每三位表示对应对应用户(组)的权限, 读/写/执行
  4. 目录的写权限仅影响是否可以删除修改其内部的文件
2.1.1.4.2.1 Most used Commands
  1. mv mv :
  2. cp cp :
  3. mv mv :
  4. mkdir mkdir :
  5. rmdir rmdir :
  6. man man :

PS. info info :

Short cut: Ctrl-L Ctrl-L : clear

2.1.1.4.2.2 Shell Stream

iostream redirection

  1. < file < file : input redirection (from file)
  2. > file > file : output redirection (into file)
  3. PS. << label << label : input redirection (until read lable)
  4. >> file >> file : output append into file
  5. prog1 | prog2 prog1 | prog2 : pipe, output redirection to another program
2.1.1.4.2.3 Root user

Super administrator

sudo sudo : super user do (do as super user)

2.1.1.4.2.4 /sys /sys

vfs: kernel variables

  1. tee tee : echo < file | sudo tee /target echo < file | sudo tee /target
2.1.1.4.2.5 Shell Scripting

语法,

2.1.1.4.3 VIM

我用的是NeoVim (LazyVim distro)

仓库地址: [My LazyVim](https://github.com/mujiu555/my-lazyvim)

2.1.1.4.4 Section IV
  • grep

  • less

  • sed

    • regular expression
  • sort

  • head

  • tail

  • uniq

  • wc

  • paste

  • awk

  • bc

  • xargs

  • parallel

2.1.1.4.5 Command Line

Short cut for shell

  • Ctrl-C: SIGINT
  • Signal: kill
2.1.1.4.5.1 Tmux
2.1.1.4.5.2 Dot files

Configurations

  • alias
  • .bashrc
  • PS1
2.1.1.4.6 Debugging
2.1.1.4.6.1 Logger

Log, like printf, but with more information.

It is possible to print with colour.

Using ASCII escape codes to draw color.

It is possible to use third party log system. Most of which may be placed in /var/log /var/log . Journalctl Journalctl will place log in /var/log/journal /var/log/journal

2.1.1.4.6.2 Debugger

Step debuggers: GDB GDB , and so on.

It is possible to walk through the execution

2.1.1.4.6.3 Static checker

Try to detect errors without actually execute a program.

2.1.1.4.6.4 Inspect
2.1.1.4.6.5 Profiling

Count time is useful.

And real time: the time program cost to execute to finish. User time: the cpu time a program used. System time: the system cost during the program executes.

2.1.1.4.6.5.1 CPU profiling

Tracing: record all information during program executes. Sampling: regularly inspect program.

Thus tracing will result to performance decrease.

Liner profiler: cost for each line's execute.

2.1.1.4.6.5.2 Memory profiling
2.1.1.4.6.5.2.1 Analysis

Perf Perf :

  • list:
  • stat:
  • record:
  • report:
2.1.1.4.6.5.3 Visualize
  • Flame Graph
  • Call Graph
2.1.1.4.6.5.4 Resource profiling
2.1.1.4.7 Meta programming

DSL everywhere.

2.1.1.4.7.1 Build systems
  • Describe how to build
  • Encode Rules

Make: GNU Make, BSD Make, NMake

2.1.1.4.7.2 Make Rules
2.1.1.4.7.3 Repositories
2.1.1.4.7.4 Version

Semantic Version

2.1.1.4.7.5 CI/CD

Continuously Integration & Continuously Distribution

  • Recipe:
  • Behaviour when something happened
2.1.1.4.7.6 Auto Testing
  • Test Suit: a large collection of tests
  • Unit Test:
  • Integration Test:
  • Regression Test:
  • Mocking:
2.1.1.4.8 Security

Password need to be hight information entropy

2.1.1.4.8.1 Hash functions
  • non-invertible
  • collision resistant

Hash in git, need no conflict, compared to old hash method.

2.1.1.4.8.2 Key derivation functions
2.1.1.4.8.3 Symmetric key cryptography
  • keygen() -> key
  • encrypt(plain text, key) -> cipher text
  • decrypt(cipher text, key) -> plain text
2.1.1.4.8.4 Asymmetric key cryptography
  • keygen() -> (public key, private key)

  • encrypt(P, public key) -> C

  • decrypt(C, private key) -> P

  • sign(msg, private key) -> signature

  • verify(msg, sig, public key) -> ok?

2.1.1.4.9 Misc
2.1.1.4.9.1 Change keyboard mapping

Remap fn keys, caplock, shortcuts.

2.1.1.4.9.2 daemons

run commands in background

2.1.1.4.9.3 fuse
2.1.1.4.9.4 background
2.1.1.4.9.5 API
2.1.1.4.9.6 WM
2.1.1.4.9.7 MD
2.1.1.4.9.8 Boot
2.1.1.4.9.9 Docker
2.1.1.4.9.10 Interactive Notepad Programming
2.1.1.4.9.11 GitHub
2.1.1.5  MIT 6.001: Structure and Interpretation of Computer Programs (SICP) [S1]

“ Computer science is not about computers, any more than astronomy is about telescopes, or biology about microscopes. ”

Computer is neither about science nor about computers, instead of a subject that helps explore the nature of computation itself, it is a engineering discipline that focuses on building systems that perform computations, aka., how to use computers to solve problems.

Likely geometry, which originally focused on measuring land, later evolved into a abstract mathematical discipline that studies the properties of space and shapes.

The main problem the computer science tries to solve is to describe the process of computation.

In mathematics, functions are used to describe relationships between quantities. In this aspect, a equation cannot tell us how to compute the value of a function. And computer science can provide us a way to describe such process, to compute and solve the functions.

The main purpose is to find the way to formalize such process, to describe the process of computation itself. In some case, the systems can be such large and complex that nobody can fully understand the whole system. And that's why we need to build abstractions to help us manage the complexity of such systems. What make this possible is the idea of procedures, which can be used to build abstractions. A technique to manage complexity.

Computer is a virtual environment that will not affect by real world constraints, such that the system can be built in any way we want. The only limitation is our imagination and creativity. A ideal system.

2.1.1.5.1 Preface
2.1.1.5.2 Section 1: Building Abstractions with Procedures

The first way to build abstraction is black boxes, aka., procedures. Which accepts some inputs, and produce some outputs, without revealing the internal details of how the procedure works. This way is called encapsulation nowadays.

Fix points: A fix point of a function is a value that does not change under the application of the function. And in this case, what we want to do is to find a way that can compute such fix points. Package the process into procedures. And how can we archive this is a instructive knowledge. How about to apply such procedure? How about to use such procedure to find the fix points of other functions? And how about to build new procedures that build upon such procedure?

In this chapter, we'd talk about several topics:

  • Primitive Elements
  • Combinations
  • Abstract and how to build new abstractions
  • Extract common patterns
2.1.1.5.2.1 Lisp

The main purpose to have such section is not to programming in Lisp, rather than to learn how to think about programming. What is about to learn is a general framework, which compose of primitives, means of combination, and the means of abstraction.

The combination of Lisp expressions are organized in a tree structure, aka., S-expressions. P.S., in compiler, such tree structure is called Abstract Syntax Tree (AST).

2.1.1.5.2.2 define define

The way to build new abstractions is using define define . By extract general ideas from specific examples, it is possible to create new procedures.

2.1.1.6  Stanford CS107: Programming Paradigm [S1]
2.1.1.6.1 Data Types and Conversion
2.1.1.6.1.1 Binary Numbers

对于正数, 直接相加即可得到结果(在范围内)

对于含负数数, 需要通过一种方式表示它的正负性

  1. 原码: 选取数值的最高位, 0为正1为负.

    直接用最高位为1的数表示, 与正数相加时可能会取得不正确的结果. 对于一个负数, 不能采用通常二进制加法, 简单将最高位置1.

       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)
       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)

    需要保证运算过后, 可以使得负数与对应正数相加值为0(最高位1溢出).

  2. 反码 1's complement: 将数值原样取反.

    正数与绝对值相同的负数相加, 和为全1, 会造成+0和-0问题

       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
  3. 补码 2's complement: 将2中结果+1, 则为所需结果, 对于实用, 将值加到负数中

       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)
       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)

    补码的数学含义: 模数加法构成阿贝尔群: 正整数的加法逆元

2.1.1.6.1.2 Characters

字符本身即为数字

2.1.1.6.1.3 Convert

小数值的赋值近似直接将对应值赋值到大数值的低位

大数值赋值到小数值空间, 直接抛弃高位

负数赋值会用符号位填充高位(逻辑赋值), 或填0

2.1.1.6.1.4 Floats
  1. 定点二进制小数: 采用几个位数表示 2^{-n}

    可以表示的整数和小数的位数一定,

    浮点数, 用以有限位数和精度逼近稠密数域上的精确小数

  2. float 32: IEEE 754 2-based float number

    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]
    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]

    实际上来说, val(10)=(1) sign ×1.baseexp2bits(exp)1+1

2.1.1.6.1.5 Endian

最高位所在的字节称为大端,最低位所在的字节称为小端.

小端序: 高位在低字节 大端序: 高位在高字节

大端符合人类阅读习惯

指针指向会被字节序影响

2.1.1.6.2 Structure ( struct struct )

指针指向结构的起始地址, 其他元素通过相对于起始地址 (基地址,类似汇编的基地址和偏移地址的关系, 汇编的偏移地址以0x10为基, 此处偏移地址以0x1为基且偏移地址的值相等于之前变量的长度的总和) 的偏移访问.

2.1.1.6.2.1 Array

指针指向数组的起始地址, 其他元素通过相对于起始地址的偏移访问. 总体类似于结构, 但是偏移地址的长度等于n倍的元素变量长度

2.1.1.6.2.2 Generic

c风格的泛型,

void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}
void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}

相对于模板, c风格的泛型不需要为相同内核的算法生成不同的二进制. 可以规避二进制膨胀问题

lsearch lsearch 参考 [ulibs.c: binsearch_linear](https://github.com/mujiu555/ublis.c)

Example for generic:

c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
2.1.1.6.3 Stack
2.1.1.6.3.1 Stack with int
2.1.1.6.3.2 Generic Stack
2.1.1.6.4 Memory Management

若需要在析构泛型栈的同时析构内部元素, 则需要提供释放函数, 以便于析构.

需要确定指针与地址.

2.1.1.6.5 Memory Segments

Soft managed memory:

When a program are loaded to memory, the heap part is managed by malloc malloc , relloc relloc , free free .

The memory space allocated for you will contains more bytes just before the head. The meta data information.

Thus, free(head+offset); free(head+offset); is not allowed. For malloc malloc needs meta data, index with offset will lead to crash.

Furthermore, free a array is not allowed, as well. For array are space allocated in stack and managed by compiler. Which also contains no meta data.

Memory manager may spilt memory into segments, and just allocate memory space for you within some specify segment if request less than 2^n bytes.

2.1.1.6.5.1 Memory compose

Split a large space of memory to handle memory allocation using handler. Handler are some pointer points to the pointer points to actual memory.

2.1.1.6.5.2 Stack segment

Stack depth roughly relative with function call count.

When define a variable or array within a function, like main, it will create stack frame, increase stack top. (Stack increase towards low address). (Similarly, heap increase towards higher address).

Stack top pointer is embedded within stack and split the stack and gap. (Gap is the space between heap and stack)

When a function has been called, a stack frame will create for it, when a function exited, stack top pointer will go back to where before frame.

Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
2.1.1.6.5.3 Memory Management

When memory allocating, memory allocator will not only allocate memory you request, but also some extra memory for meta data.

text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to
text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to

Some times memory manager may use some free space for storing free space block meta data.

Allocate strategy:

  • Best fit
  • Worst fit
  • First fit
  • Continuous search

Some times memory allocator may return more space you need, but you can only rely on space you request.

Compact:

2.1.1.6.6 Section IX: Computer architecture

If have code:

c
int i;
int j;

i = 10;
j = i + 7;
j ++;
c
int i;
int j;

i = 10;
j = i + 7;
j ++;

Assuming memory segment:

text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+
text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+

Assume i, j are packed together within stack. BP storing stack base address.

To visit variable i i , using [SP+4] [SP+4] . Thus, i = 10; i = 10; could be written as mov [sp+4], 10 mov [sp+4], 10

For j = i + 7 j = i + 7 , it should first load i i and then do ALU operation.

  • load i i : mov r1, [sp+4] mov r1, [sp+4]
  • add: add r2, 7 add r2, 7

Then, mov [sp], r2 mov [sp], r2 . And, inc [sp] inc [sp]

2.1.1.6.6.1 Load / Store, ALU Operations
2.1.1.6.6.2 force conversion

Force conversion just cheat compiler rather than assembler. Assembler knows only address.

2.1.1.6.7 activate record: function call frame

If have: prototype:

void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}
void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}

The argument of corresponding parameter and the local variables are placed in almost close place.

4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why
4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why

When calling within other functions: like main main :

int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}
int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}

We may have:

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp

at initial.

Then, allocate space for variable i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp

Assign for i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp

When calling foo foo : pushing argument to stack for foo foo :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
2.1.1.6.8 Section XI: Swap, call in assembly
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}

In assembly, _cdecl _cdecl , arguments are pushed in reverse order:

_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp
_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp

While swap swap may written as:

void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}
void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}

8 bytes are reserved for saved pc saved pc and 16 bytes for 2 arguments. a a for rsp - 8 rsp - 8 , b b for rsp - 16 rsp - 16 since the program runs in x86_64 machine. Left most parameter lays at the button of stack frame.

In c:

void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}
void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}

swap swap function does not implemented as code shown in c, but use xchg xchg .

2.1.1.6.9 Pre-process, Compile, Assemble, Link

Code -> Processed Code -> Assembled Code -> Objected File -> Executable File

2.1.1.6.9.1 Preprocessor
2.1.1.6.9.1.1 #define #define

Replacement of text appear in source file.

  1. constant replacement

    #define SIZE 1024
    char buf[SIZE];
    #define SIZE 1024
    char buf[SIZE];
  2. parameterized macro

    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
2.1.1.6.9.1.2 #include #include
2.1.1.6.9.2 compiler
2.1.1.6.10 Section XIII:

What if comment #include <stdio.h> #include <stdio.h> ?

The program can probably still be compiled.

What if comment #include <stdlib.h> #include <stdlib.h> ?

assert assert will be seen as a function and the final object file will miss the symbol.

void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}
void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}

Will loop, forever.

What will happen if

int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}
int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}

Two function have same memory structure so that the Print can work correctly, since the function Declare Declare will not clean whole bit pattern after returning.

The technology is called "Channeling".

2.1.1.6.10.1 multiple arguments

Push arguments from right to left. For better organization of compiler.

2.1.1.6.11 Multiple Threads

Operating systems give different process a virtual memory. So that the program can assuming it holds all memory.

Kernel trace and maintaining Virtual Memory Mapping Table and calls MMU to map virtual memory of each process to real memory.

Program execution is sequential.

When multiple processes share one shared data, it may manipulate the data after other process manipulate it already. E.g., read a variable and check it already fit the requirement, when it about to do operation on it, it was switched to another process by scheduler, and the other process do operation on the variable successfully. When the time scheduler dispatch back to original one, it will never able to validate the variable and do same operation to the variable. Which cause the error.

The condition happened here called race condition.

There always be some critical section in code, when code executing in critical section, it will never able to validate the shared data again.

The solution is to use semaphore or lock to protect critical section. When a process want to enter critical section, it will try to acquire the lock.

Semaphore is a integer variable with atomic operation ability, when it is 0, the process can not enter critical section, else if it is greater than 0, the process can enter critical section and decrease the semaphore by 1 atomically. When leaving critical section, add the semaphore, release the resource.

Semaphore operations acquire resources.

2.1.1.6.11.1 Producer Consumer Problem

Producer generates data, puts into a buffer. Consumer takes data from buffer, process it.

Consumer should not take data when buffer is empty. Producer should not put data when buffer is full.

Use two semaphores to track the number of empty slots and full slots in buffer.

2.1.1.6.11.2 Reader Writer Problem

Reader Writer problem is a classic synchronization problem. With two types of processes, readers and writers, readers can read shared data simultaneously, writers need exclusive access to shared data.

2.1.1.6.11.3 Philosophers Dining Problem

Every philosopher needs two forks to eat. Five philosophers sitting around a table, when a philosopher wants to eat, it will try to pick up the left and right forks. But if all philosophers pick up the left fork first, then they will never able to pick up the right fork,

This is a deadlock.

2.1.1.6.11.4 Ice cream Shop Problem
2.1.1.6.12 Functional Programming Paradigm

In functional programming paradigm, each function are treated as regular mathematical function. Which accepts some input and produce some output.

;car
;cdr
;car
;cdr

car car in scheme extracts the first element of a list. While cdr cdr extracts the rest of the list.

Known already, so for short, Mujiu will not explain more about scheme here.

In scheme, or in lisp, car and cdr comes from lisp machine assembly instruction. There are two registers, address register and data register, which is the ar ar and dr dr where car car and cdr cdr comes from.

2.1.1.7  From The C Programming Language To Theoretical Computer Science (Section −1) [S-1]
2.1.1.7.1 From The C Programming Language to Theoretical Computer Science
2.1.1.7.1.1 Section −1: Linux and Tool Chain
2.1.1.7.2 Contents
From The C Programming Language to Theoretical Computer Science ⁠1
Section −1: Linux and Tool Chain ⁠1
Intro ⁠1
Virtualization ⁠1
Virtual Machine ⁠1
Full Virtualization & Semi Virtualization ⁠1
Hardware Virtualization Support ⁠1
Virtual Box ⁠1
Operating System ⁠1
Bootloader ⁠1
Bootstrap ⁠1
Kernel ⁠1
GRUB, Systemd-boot ⁠1
GNU/ Linux, Minix, GNU/Hurd, *BSD, Illumos, Drawin, …: *nix (Unix-Like) ⁠1
Distribution ⁠1
Debian, Ubuntu, RHEL, Arch, NixOS, Slackware ⁠1
Root Distribution ⁠1
Why Ubuntu ⁠1
Live CD ⁠1
Bootstrap ⁠1
Installation ⁠1
Partition ⁠1
Partition Table ⁠1
File System ⁠1
Log, CoW, Snapshot ⁠1
User & Group ⁠1
Privilege ⁠1
Root user ⁠1
Sudo ⁠1
Terminal, Shell, Terminal Simulator & tty/n tty/n ⁠1
FHS ⁠1
home ⁠1
root ⁠1
bin & sbin ⁠1
usr ⁠1
User Commands ⁠1
Sudoer Commands ⁠1
commands, parameters, augments ⁠1
shell tricks, pipeline, i/o redirection ⁠1
Forground & Background ⁠1
Process Suspend ⁠1
signal ⁠1
Terminal Reuse ⁠1
Aliasing ⁠1
SSH ⁠1
Shell substitution ⁠1
Command line Editor ⁠1
Version Control ⁠1
Build System ⁠1
2.1.1.7.2.1 Intro
2.1.1.7.2.2 Virtualization
2.1.1.7.2.2.1 Virtual Machine
2.1.1.7.2.2.2 Full Virtualization & Semi Virtualization

Full Virtualization:

全虚拟化通过软件模拟硬件的架构, 和运行, 效率低

  1. qemu
  2. bochs

Semi Virtualization:

半虚拟化有硬件提供辅助, 虚拟化运行的指令可以直接发到硬件, 由硬件直接运行, 需要硬件支持, 并且无法跨硬件平台模拟

  1. KVM
  2. ZEN
2.1.1.7.2.2.3 Hardware Virtualization Support
2.1.1.7.2.2.4 Virtual Box
2.1.1.7.2.3 Operating System
2.1.1.7.2.3.1 Bootloader
2.1.1.7.2.3.2 Bootstrap
2.1.1.7.2.3.3 Kernel
2.1.1.7.2.3.4 GRUB, Systemd-boot
2.1.1.7.2.4 GNU/ Linux, Minix, GNU/Hurd, *BSD, Illumos, Drawin, …: *nix (Unix-Like)
2.1.1.7.2.4.1 Distribution
2.1.1.7.2.4.2 Debian, Ubuntu, RHEL, Arch, NixOS, Slackware
2.1.1.7.2.4.3 Root Distribution
2.1.1.7.2.4.4 Why Ubuntu
2.1.1.7.2.4.5 Live CD
2.1.1.7.2.4.6 Bootstrap
2.1.1.7.2.4.7 Installation
2.1.1.7.2.4.8 Partition
2.1.1.7.2.4.9 Partition Table
2.1.1.7.2.4.10 File System
2.1.1.7.2.4.11 Log, CoW, Snapshot
2.1.1.7.2.4.12 User & Group
2.1.1.7.2.4.13 Privilege
2.1.1.7.2.4.14 Root user
2.1.1.7.2.4.15 Sudo
2.1.1.7.2.4.16 Terminal, Shell, Terminal Simulator & tty/n tty/n
2.1.1.7.2.4.17 FHS
2.1.1.7.2.4.18 home
2.1.1.7.2.4.19 root
2.1.1.7.2.4.20 bin & sbin
2.1.1.7.2.4.21 usr
2.1.1.7.2.4.22 User Commands
2.1.1.7.2.4.23 Sudoer Commands
2.1.1.7.2.4.24 commands, parameters, augments
2.1.1.7.2.4.25 shell tricks, pipeline, i/o redirection
2.1.1.7.2.4.26 Forground & Background
2.1.1.7.2.4.27 Process Suspend
2.1.1.7.2.4.28 signal
2.1.1.7.2.4.29 Terminal Reuse
2.1.1.7.2.4.30 Aliasing
2.1.1.7.2.4.31 SSH
2.1.1.7.2.4.32 Shell substitution
2.1.1.7.2.4.33 Command line Editor
2.1.1.7.2.4.34 Version Control
2.1.1.7.2.4.35 Build System
2.1.1.8  From The C Programming Language To Theoretical Computer Science (Section I) [S1]
2.1.1.8.1 Section I: C Programming Language

To have a glance to computer science, we must have known a programming language, and then it could lead you to understand some key concept within the computer and programming language design.

2.1.1.8.2 Intro

C语言, 历史悠长, 自从它于80年代伴随 Unix 出现, 便成为了全世界开发者的心头好. 至今为止都依然被广泛使用. 上到各种琳琅满目的应用程序, 下到操作系统内核, 都可以由C编写, 都依赖C的代码.

举个例子: 世界上的绝大多数服务器, 都是由 Linux Linux 承载着的, 而 Linux Linux 的内核, 几乎只有 C C 所编写的代码. 当然, 在大家的手机上, 任何一部安卓手机, 它的内核, 其实也是Linux, 可以说, C 驱动着世界上绝大多数设备的运行. (之所以不用Windows举例, 一是Windows是一个闭源产品, 二是Windows内核主要由微软自己魔改的C++代码编写)

C是一门高级语言, 但是何为高级语言?

2.1.1.8.3 High Level Language

高级语言是相对于低级语言而言的. 一般而言, 我们所说的低级语言, 是各个不同设备上面的汇编语言, 这些语言非常强大, 可以操作 CPU, 也非常基础, 一旦没有它们, 任何后续的工作都无法进行.

但是它们的问题也非常严重. 那就是它们与平台极度绑定, 一段代码, 只能在特定平台上工作. 即便逻辑相似, 或者完全一致, 但是你还是不得不按照不同平台的规定, 为它们依次适配. 这仅仅只是开发过程, 就已经可以体会到通过低级语言开发程序的麻烦了. 而到了软件升级这一步骤, 这样的一套流程就更加恐怖, 复杂度直线上升.

而高级语言, 是一种对于低级语言共同特征的抽象, 帮助程序员写出可以在不同平台间无痛或相对轻松移植的代码.

低级语言, 就像是专门为特定的设备编写的特制工具, 只能在某台设备上面使用. 它们虽然可以直接操作硬件设备, 但是写起来非常复杂. 而高级语言, 比如C或者Python, 可以让程序员使用更加容易理解的方式写出程序. 系统可以帮你, 将你的代码, "翻译" 成为机器可以理解的指令, 这样即便不担心底层的细节, 也能让程序在不同的设备上运行.

当通过C编程语言进行工作的时候, 我们可以抽象出加减乘除等操作, 分别对应操作不同位数数据的汇编指令; 可以抽象出各种变量, 直接对应内存中的一段空间.

比如: 如果只是以两数相加举例的话, 对于C而言, 无论哪个平台的加法都可以通过 a + b a + b 来完成, 但是对于 IBM IBM 兼容机型的 x86_64 x86_64 架构 intel intel 语法宏汇编 (好长的定语) 而言, 则可能是 ADD AH, BH ADD AH, BH , ADD AX, BX ADD AX, BX , ADD EAX, EBX ADD EAX, EBX , 乃至于 ADD RAX, RBX ADD RAX, RBX 这里甚至只是考虑到只有两个通用寄存器参与运算的情况, 如果还有内存, 还要复杂的多. (其实如果用 AT&T AT&T 语法还能更复杂些, 毕竟 AT&T AT&T 还要考虑指令名的问题).

这就为程序的移植提供了极大的方便, 不再需要手动为不同的平台进行适配.

2.1.1.8.3.1 Mid-Level Language

C语言虽然名义上是一个高级语言, 但是很多人并不这么认为, 因为C语言并不提供一种通用的内存管理方案. 所有的内存都需要由程序员自己来手动管理. 这为系统编程提供了便利, 但也造成了不少内存泄漏等问题. 依旧需要考虑与低级语言汇编相似的边界问题.

因此, 便有人将C语言称作中级语言, 过渡语言. 不过, 这不过是称呼上的差别而已.

2.1.1.8.3.2 Compile & Interpret

CPU 实际上只能够理解和运行二进制的机器码. 因此, 直接以人类可读形式写出来的代码, 计算机没有办法直接执行. 这就需要对代码进行 编译 编译 , 或者 解释 解释 .

源代码 编译 汇编文件 汇编 目标二进制 链接 目标可执行
  1. 编译, 是将代码编译到汇编语言 (或其他语言), 再通过汇编器生成对应二进制代码, 最后链接, 产生原生可执行程序 (该可执行程序会最终包含操作系统需要的结构) 的一种过程.
源代码 解释器 输出
  1. 解释, 则是不经过编译过程, 通过虚拟机, 或者解释器, 随读入源文件执行代码的过程.

实际上, 对于现代语言, 编译型语言和解释型语言的区别并没有特别大. 比如, Java Java 语言就既需要编译到 JVM bytecode JVM bytecode , 也需要用 JVM JVM 解释字节码运行.

而我们, 会因为一门语言更倾向于如何运行, 来说这个语言是编译型语言, 或解释型语言. 比如, C语言, 就是一门会要求编译, 再运行的语言, 因此, 我们认为, C语言, 是一门编译型语言. 再如, 大家或许熟悉的 Python语言, 便是通过解释器执行的, 因此才认为 python语言 是一门编译型语言.

2.1.1.8.4 Environment And IDE

不知道大家是否喜欢玩 PC 上的游戏, 有时候玩游戏会提示缺少 DirectX DirectX 运行时环境, 编程也和玩游戏一样, 是需要环境的. 一般而言, 我们将这种专门用于开发程序的环境, 称作开发环境. 而将所有开发所需要的工具和开发环境本身, 一起打包, 并预先配置的软件系统, 就称作集成式开发环境(IDE).

在 Windows 平台上, 最常用的C语言 IDE 是 Microsoft (C) Visual Studio, 不过这个 IDE 以及它配套的编程环境, 都是为了 C++ 和 C# 而量身设计的, 并不太适用于 C 语言, 而它强制要求的工程管理, 以及提供的过多功能, 也容易导致初学者眼花缭乱, 忽视C语言学习的核心.

而 MacOS 平台上, 苹果公司提供了 Xcode IDE, 不过除了不得不写 Swift, 也几乎没有人使用它.

Linux 平台, 最常用的 "IDE" 是 (Neo)Vim 和 Emacs, 不过, 并不适合所有人使用.

鉴于平台相对不易统一, 而以上三个平台, 均提供了相对简单的方式以 LLVM-Clang LLVM-Clang 编译器作为 C语言 的编程环境, 在此处, 我们将采用手动配置环境的方式, 来作为学习C语言的第一步. 这也是大多数教程, 机构, 学校, 并不会教授, 而对于后续编程学习至关重要的一个部分.

另两个个人认为相对重要的部分是工具的使用和工具与知识的区别, 分别可以在 "计算机教育中缺失的一课 (The Missing Semester of Your Computer Science Education)" 和 "理论计算机导论 (Introduction to Theoretical Computer Science)" 中找到.

2.1.1.8.4.1 Environment Variables

环境变量可以被视为程序的设置, 它们告诉程序该如何工作, 比如, 配置 "PATH" 可以帮助程序找到需要的文件或者指令.

简单的理解, 对于程序而言, 这就是字典的索引, 当我试图索引一些信息的时候, 可以先去目录找到 "键", 然后根据 "键" 取得 "值".

而这些组合, 可以控制程序的行动. 目前需要了解, 并且对于今后都非常重要的一些环境变量分别是:

  • PATH PATH : PATH 变量就像是指示牌, 告诉了系统到哪些地方找到你输入的指令
  • 例如: 当你希望去通过 gcc 来编译程序的时候, 系统就会到 path 指定的文件夹中, 查找 gcc 程序. 如果没有办法找到, 就会报错.
  • 当我们在控制台(命令行) 输入一些指令, 并试图执行它们的时候, 操作系统就会通过 Path 环境变量搜索, 如果可以找到, 就执行对应找到的指令, 如果没有, 则会报错.
  • 当然, 不只是我们自己执行指令的时候需要用到Path, 很多其他的程序也会通过 PATH 来找到它需要的程序. 比如动态链接器 ( ld-linux-x86_64.so ld-linux-x86_64.so )
  • 好吧其实目前只用知道 PATH 一个就够了 (
2.1.1.8.4.2 Windows

对于 Windows 而言, 环境变量的修改非常便捷安全:

打开 文件资源管理器 (Explorer), 右键点选 "此电脑", 并在弹出菜单中选择 "属性" - "高级系统设置" - "高级" - "环境变量" 即可看见环境变量的配置窗口.

如果需要编辑任何之一, 只需要双击点选项目, 就可以看见对应修改界面了.

那么, 如果需要手动安装C语言的开发环境, 就需要先下载对应编译器, 然后将编译器本身所在的路径通过以上的方式加入PATH环境变量中. 不过, 相对于其他方式来说, 这种方式不仅不方便, 当需要更新开发环境的时候, 也会非常麻烦.

当然, windows也有更简单的方法去安装 C语言 的编程环境, 那就是通过 WSL.

WSL的全称是 "Windows Subsystem for Linux", 是微软创造出来, 用于提升开发者体验的一个工具. 凭借WSL, 我们可以非常容易的, 像直接使用Linux一样的安装和管理开发环境.

2.1.1.8.4.3 Linux, MacOS & *nix

对于类Unix及Unix系统而言, 环境变量的修改往往和用户配置文件相关联. 不过, 实际上, 要在这类系统上安装 C 的编程环境, 完全不需要对环境变量做过多修改, 而可以简单通过几行命令完成.

2.1.1.8.5 Hello, World

于是便到了我们的第一个程序: Hello, World!

这是一个来自于 C程序设计语言 (the C Programming Language) 中的例子, 同时, 它也陪伴了一代又一代新生的程序员. 带着我们对自己创造的新世界的欢呼.

"Hello World" 是程序设计中的经典入门例子. 它简单的向屏幕输出一句话, 帮助你了解代码的基本结构和运行流程. 学会了如何编写和运行 "Hello World", 你就可以开始学习更加复杂的程序啦.

#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}
#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}

大家可以用任何笔记本将这段代码写下, 将它保存 (不要放桌面) 为 hello.c hello.c .

然后, 我们就可以开始进行编译了:

  1. Open a terminal,
  2. Enter dir dir : cd ${pwd} cd ${pwd} , where ${pwd} ${pwd} is the directory your file placed in,
  3. check if there exists file hello.c hello.c , type cat hello.c cat hello.c and press enter enter . Just after the command has been inserted, the content of whole file will be displayed. If the content printed in screen does not match the contents showing in your text input area, then you have not save the file properly. For example, the command will response with:

    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }
    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }

    in my computer with my code shown above.

  4. 最后, 输入 clang hello.c -o hello clang hello.c -o hello , and it will give no information if there are no syntax error or other problems.

然后我们就会获得一个名为hello的文件 ( hello hello 是文件名, .exe .exe 叫拓展名). (you may find it at the file explorer). 这就是我们的目标可执行文件了!

Finally, 大家可以在终端中输入 ./hello ./hello 来执行它. 这样, 就可以看到它执行以后的结果啦:

Hello, World!
Hello, World!

这样, 你就完成了c程序的基本组成, 下面, 我们将依次简单的介绍, 它们都代表了什么含义. 这样, 你就可以自己尝试, 修改这个程序的内容, 写出独属于自己的 "Hello World".

Try to change the source code and you may let it print your name.

2.1.1.8.5.1 Explanation

Looks fantastic?

Here let us explain the structure of our current program.

The c program always composed in similar order. For example, we always have the three parts – header file import, entry, and expression.

我们的 "Hello, World" 程序, 包含了几个部分, 库文件的引入, 入口函数(main), 以及主要的表达式.

2.1.1.8.5.2 Library

C语言的内核很小, 只包括了一些非常基础的功能, 而其他的部分则都通过库来提供. 同时又因为它相对比较简陋, 所以当我们使用它的库的时候需要一个描述文件, 这个文件就可以告诉编译器, 这个库提供了哪些功能.

比如说, 这段程序, 首先是一串以 '#' 号开头的文本, 这句话表示, 我们引入了一个名叫stdio的库的定义.

'#' 号, 实际上代表了 "预处理指令" 的开始, 这里的预处理指令就是 "include". Include指令常常被用来包含一个文件, 比如说这里, 就包含了 stdio.h 这个文件.

Stdio, 是 "Standard Input / Output" 的简称, 它定义了常用的输入和输出函数, 它也将会成为后续C语言程序设计中最常用的库.

那么include指令是怎么样确定它需要包含哪些文件的呢? 实际上这取决于他需要包含的文件通过什么包裹. 比如在这里, 我们就使用尖括号 ('<' 和 '>') 包裹了 stdio.h, 它表示编译器会从系统路径中查找, 如果找到这个文件, 就将这个文件完整展开在指令处. 而如果我们通过双引号 ('“') 包裹了 stdio.h, 编译器就会先尝试从当前目录查找文件了.

大家可以尝试, 在 hello.c hello.c 同目录, 创建一个 stdio.h stdio.h 文件, 再重新编译一下这个程序, 看看是否会有区别.

如果将尖括号改成双引号呢? 比如我们下面会说到的 printf printf "函数", 就是由stdio.h文件告知编译器的.

那么什么是函数呢… 先卖个关子, 后面会对函数有详细的解释.

下面就是我们程序的主体了.

2.1.1.8.5.3 main
int main(void) {
  // ...
}
int main(void) {
  // ...
}

这部分, 就是我们的程序开始执行的部分. 如果没有它, 我们的程序就没有办法执行.

大家可以试一试, 如果不写这些部分, 只写下中间的 printf("Hello, World!\n"); printf("Hello, World!\n"); 会出现什么情况? 当然, 当我们按下运行按钮的时候, 它会告知, 这段程序并不 "合法". 当然, 这不是在说我们做了违法的事情, 而是这样的程序, 不合C语言的语法.

同时, 如果看到 Visual Studio Code 底部的 "PROBLES" 面板, 也可以看到, 它告知我们, 这个文件, 有许多的问题. 我们将它告知的信息称之为, 错误信息, 或报错.

我们将这个部分称作 "主函数定义". 而这个main, 就是主函数了.

它基本可以被认为是固定格式 (固定格式一共有四种, 托管环境三种, 非托管环境一种, 但是目前只需要会这一种即可).

printf("Hello, World");
printf("Hello, World");

则是我们程序唯一的主体 — 我们的程序实际上只干了这一件事 — 输出 "Hello, World".

2.1.1.8.5.4 Function

刚才的两个部分, 我们都提到了一个概念 – "函数", 函数是什么呢, 函数实际上是一系列代码, 一系列功能的集合, 通过定义函数, 我们可以将一些不同的操作组合在一起. 方便了程序的开发. 同样的, 也可以把这样的函数提供给自己, 或者其他人使用.

比如我们用到的 printf printf 函数, 也比如我们定义的main函数.

和数学里的函数类似, 函数可以接受一些参数, 并且产生一些输出. 就像多元微积分里的向量函数,

𝑓(𝑥,𝑦,𝑧):3

就可以接受x,y,z这样的参数, 并且将它们经过一系列的变换, 让它们变成一个普通的一维值.

这里的 printf printf 和它之后的圆括号的组合, 我们将其称作函数调用. 其实也和数学中的函数, 含义一致.

Printf(...) Printf(...) 的作用是, 将文本按照一定格式打印到屏幕上, "Print (with) format", 就是这个意思啦.

而这里的 "Hello, World" "Hello, World" 就是函数调用的参数, 它告诉 printf printf 函数, 要将什么东西给输出到屏幕.

不过这里只是简单介绍它的作用哦, 实际上 printf printf 函数的作用远不止这样简单的! 我们后续会有章节单独介绍它的功能.

return 0;
return 0;

这一句, 用于终止这个函数: "main". 当编译器看见这一句话, 就知道要结束这个函数的执行了… "返回".

这其实也涉及到了一些后面的知识, 所以目前记住主函数的结束, 必须写上这样一句 return 0; return 0; 就可以了.

2.1.1.8.5.5 Expression: Statement.

大家如果仔细观察了, 就会发现, main函数内部的两个东西, 结尾都是分号.

其实, 分号 (';'), 表示一个语句的结尾. What is statement, statements are base unit of c programming language. Every c program are make up with statements For example, our simplest program is:

int main(){}
int main(){}

here, it contains just a function definition statement. But after all, every c program must have at least one statement.

Statements are colourful, but, the rule for them are relative same. 除了一些特殊情况, C语言中写下的所有代码, 结尾都是有分号的.

语句大致可以被分为五种:

  1. 表达式语句
  2. 函数调用
  3. 流程控制语句
  4. 复合表达式
  5. 空语句

将会在后面详细讲解各个语句, 不过, 一定要记住, 每个语句的结尾都需要一个分号;

2.1.1.8.6 Types

C 语言是一门静态类型语言. 那么, 这一句话就涉及到两个新知识点了!

  • 什么是类型,
  • 什么是静态类型?

作为一门计算机语言, C语言操作的实际上都是一些数值. 对于不同的数值, 我们会人为规定它是什么 "类型".

比如, 我们就将大小在 2147483648(231)2147483647(2311) 之间的整数视为 "整型数 (Integer)". 而同时, 我们也需要表示一些文本, 所以就有了所谓的 "字符(Character)" 类型和 "字符串([Character] String)" 类型.

不过为什么需要将不同类型区别开来呢? 很明显, 字符串是没有办法当作整数来处理的对吧! (除非你把它们当作范畴论范围上面的幺半群来看… 当然这样也只能统一操作而没有办法让字符串和数字相加哦~)

那么静态类型是什么呢?

就像数学并不完全是数字的操作, 大部分时候也和未知数相关一样, 计算机程序也有自己的 "未知数" 需要操作. 当我们需要计算一些东西的时候, 很多时候都需要一个叫做 "变量" 的东西存储中间结果. 这个 "变量" 既然需要存储数据, 那么它就也需要一个类型. 毕竟, 不同类型的数据, 就上上面刚刚说明的, 有着不同的属性, 完全没有办法用同样的方式存储.

而 C语言 更进一步, 为了避免变量在多次赋值以后, 类型会不清, 干脆让我们在定义变量的时候就固定它可以承载的数据类型了. (实际原因当然不是这样啦, 实际上 C语言 必须有类型的信息, 才能为变量分配空间, 而不同的类型一般而言需要的空间不同, 自然不可以混用, 后续将在 "内存模型" 部分详细解说喵~ >w<) 这就是我们说的 "静态类型" 系统.

2.1.1.8.6.1 Literal

字面量, 就像我们在解数学题目的时候, 会写下一些系数, 一些常量, 字面量就是直接出现在程序当中的常量.

不过和常量有一些区别的是, 字面量是真正没有办法被改变的. 而计算机程序中的常量, 则仅仅只是表示一个变量不会被改变而已… 通过一些特殊的手段, 我们也是可以让一个常量打开心扉, 接受新的数值的.

2.1.1.8.6.2 Basic Data Types

对于简单的编程任务, C语言定义了一些基本数据类型. 它们涵盖了数字, 文本和逻辑(好吧其实并没有).

2.1.1.8.6.2.1 Integer

我们最常用, 并且也将最先介绍的就是整数家族了:

  • short short : 短整型, 相对于整型, 需要的内存更少, 只有16位空间 但是相应的,可以表示的数值也越少.
  • int int : 整型, C语言中默认的数据类型, 一般为32位空间, 也就是可以有31位二进制可以用于表示数据, 上述的 21474836482147483647 便是它可以表示数据的范围
  • long long : 长整型, 相对于 int int , 可能更长, 一般在处理大数据的时候才会用到
  • long long long long : 真长整型, 确定的64位数据.

每当我们在代码里面写下一个整数, 它就会自然具有上述类型之一的信息. 比如:

short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;
short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;

注: 以上代码均写于 主函数 当中!

这里, 0, 65536, 2147483647 就都是 "int" 类型的 "字面量", 而 2147483648 就是一个 "long long" 类型的字面量了.

不过这些数字前面的类型和等于号都有些什么作用呢… 大家马上也会明白! 不过我们先来了解一下整数的变体们:

  • signed signed : 有符号前缀, 表示该类型是一个有符号的数据, 一般而言, 整型都是有符号的
  • unsigned unsigned : 有了上一条的提示, 当我们不需要表示数据的负数部分时, 当然就可以用无符号类型了, 当我们用无符号来修饰一个变量的时候, 它的表示范围就会从一半正一半负, 变成完全的正数哦, 相当于给 加上了一个的上标, 变成了, 不仅如此, 它正数部分的表示范围也会翻倍
  • 不过虽然被称作前缀, 它们其实也是可以 "单干" 的, 当只有前缀出现时, 实际上 C语言 (标准) 会自动给他补上一个 int 的.

这里可以再来几个例子:

signed int i = 2147483647;
unsigned int u = 2147482647u;
signed int i = 2147483647;
unsigned int u = 2147482647u;

Integer may be expressed as:

<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
2.1.1.8.6.2.2 Literal Suffix

有些同学可能就注意到了, 我们有些的数字之后, 跟上了一些字符. 这些字符, 比如 ll ll , ull ull , 被称作字面量后缀, 它的作用是, 给字面量一些修饰, 以方便编译器正确的处理这些数值.

那么, 大家注意到:

long long ll = 2147483648ll;
long long ll = 2147483648ll;

这一行, 大家可以尝试将这一段文本的字面量后缀 ll ll 去掉, 看一下, 会发生什么? 当我们尝试运行程序的时候, 程序报错了.

这是因为, 在C语言中, 我们写下的所有整数, 默认的类型都是int类型, 如果字面量超出了int类型的范围, 那就会出现错误.

2.1.1.8.6.2.3 Real numbers: float float & double double

在整数之外, 我们自然还有小数. 在 C语言 中, 我们将小数称之为 "二进制浮点数" 简称 "浮点数".

C语言中的常用浮点数一共有三种, 分别是:

  • float float : 默认浮点数, 一共占用32位字长, 不过相对于整数, 浮点数并没有精确的表示范围
  • double double : 双精度浮点数, 相对于 float float , 它的表示精度更高
  • long double long double : 双精度的升级版

不过为什么浮点数要叫做浮点数呢? 当然是因为它的小数点不是固定的啦.

不过, 也许还有人会疑惑, 什么叫做固定的小数点? 一般而言, 小数的位数不是无限的吗? 这当然还是因为计算机表示的局限性.

比如, 当我们需要表示金额的时候, 一般都可以写作 "XX元Y角Z分" 对不对, 那么当我们想要统一在 "元" 表示的时候, 就可以写作 "XX.YZ元" 了. 那么这里, 我们相当于是将所有单位统一到 "元", 而给 "角" 和 "分" 固定在了小数点后两位. 这就是所谓的 "定点数". 或者说, "100倍放缩的定点数".

那么, 有了 "定点数" 的前置理解, "浮点数" 或者 "动点数" (这是我瞎起的) 就好理解了. 因为定点数太过于固定, 只能适用于某些特殊场景. 所以就可以想到, 如果我们用一些方式, 记录住小数点的位置, 不就可以来表示任意形式的小数了吗. 于是, 浮点数就诞生了. 不过, 上面我们表示的 "定点数", 是以 10 为基底的十进制定点数, 而在计算机里, 我们使用二进制数来表示数据, 因此, 我们实际上使用的浮点数也是二进制表示的. 这就可以解释什么叫做 "二进制浮点数" 了.

2.1.1.8.6.2.4 Type Boost

当然, 在数学之中, 我们也有整数和小数的运算, 大家可以先试一下, 当我们在c语言之中, 进行了可以得到小数的运算之后, 会得到怎么样的结果?

printf("%d", 1 / 2);
printf("%d", 1 / 2);

结果是0, 是不是很奇怪?

因为, 在c语言中, 整数和整数之间的运算, 只会得到整数, 如果需要一个浮点数结果, 就必须让一个浮点数参与运算, 比如

printf("%f", 1 / 2.0);
printf("%f", 1 / 2.0);

这样, 就得到了0.5.

为什么会这样呢? 因为在 C语言中, 当一个运算涉及的类型不相同的时候, 会将表达范围较小的数据, 转换成为表达范围更大的一个数据, 再去参与运算. 我们将这种过程称作, 自动类型转换.

当这里的int类型的整数, 遇见了2.0这样一个float类型的浮点数, 实际上浮点数的表示范围大于整数, 所以, int就被提升到了float类型, 并且参与运算, 得到 1.0 / 2.0 = 0.5 了.

以下是自动类型转换的图表

small -------------------------------------------------------> -------------------------------------------------------> large
char, short, int unsigned int long long long float double long double

从左到右, 类型依次自动提升.

而从整数开始的类型转换, 被称作 "整型提升". 比如可以看到, char, short, int类型, 均为同样的自动类型转换阶段. 因为对于char, short, 和int类型, 都发生了相同了整型提升, 按照C语言的规则, 会将所有的表示范围小于int的类型, 均提升到int类型的大小来参与运算.

无论使用什么整数, 都可以在表达式中使用char, short int或 int字段(全部带符号或没有符号)或枚举类型的对象. 如果一个int可以代表原始类型的所有值, 则该值将转换为int; 否则, 该值将转换为unsigned int, 这个过程称为整体提升.

这从汇编的角度来看, 其实就是将寄存器由小寄存器, 拼接到相对大的寄存器. 如, 将 AH AH 寄存器, 提升到 EAX EAX 寄存器.

2.1.1.8.6.2.5 String & Char

另一部分, 在数值之外, 就是字符类型和字符串了.

我们在数学的学习中, 计算出的结果, 直接写在 "解" 字后面就可以, 这实际是一种得出结果的 "输出" 过程. 那么, 同为进行数学计算的计算机, 要如何组织它的输出呢? 当然就是靠字符串咯:

printf("This Is A String");
printf("This Is A String");

依旧是熟悉的 printf printf , 不同的是它需要操作的字符串.

字符串, 顾名思义, 是一串连续的字符序列, 一般我们用双引号括住的一串连续文本来表示一个字符串字面量.

那么字符该怎么样表示呢?

很简单, 除了双引号, 我们还有单引号呀. 理想情况下, 所有的单引号包括的单个字符都是一个字符. 不过, 因为有些字符完全没有办法用键盘打出来, 所以我们也提供了另外一些方式:

  • 'c' 'c' : 单引号包括字符
  • '\ooo' '\ooo' : 按8进制表示的字符
  • '\xhhh' '\xhhh' : 按16进制表示的字符

当然咯, 有些字符远超过了字符可以表示的长度(8位), 所以我们还有另一种字符类型: "长字符" 类型.

  • L'c' L'c' : 单引号包括的长字符
  • L'\ooo' L'\ooo' : 单引号包括的8进制表示长字符
  • L'\xhhhh' L'\xhhhh' : 单引号包括的16进制长字符

大家其实也可以看出来, 长字符字面量实际上就是给普通的字符字面量添加了一个"L"前缀罢了. 那么实际上, 我们也可以用同样的方式, 把一个普通的字符串字面量变成长字符串:

wprintf(L"Hello World");
wprintf(L"Hello World");

注: 实际上中文字符都会超过字符类型可以表示的范围, 但是为什么普通字符串可以表示含有中文的文本呢? 比如, printf("你好, 世界"); printf("你好, 世界"); . 因为字符串实际上不一定是一个字符变量表示一个字符, 现在看来可能会有些绕口, 但是当我们讲到字符串实际的表示方式的时候, 就会很好理解了.

所以也不是特别需要用长字符串来表示文本了.

对了, 不知道大家有没有注意到, 当我们描述整数类型的时候, 并没有说到8位整数, 对应着其他语言中很常见的 byte byte 类型? 这是因为, c语言用 char char 类型代替了8位整数, 所幸, c语言中并不是很常用到8位的数值, 因此这样的代替也并不是很大的问题. 当我们真的需要它的时候, 也可以临时用 char char 类型充当一下.

2.1.1.8.6.3 Logical Values

当然, 计算机也不总是只处理数值. 作为一堆二三极管, 逻辑门, 晶体管拼接而成的产物, 有有着天生的二进制表示, 二进制逻辑也是计算机程序处理的内容之一.

先从简单的入手, 逻辑一共有两种状态, 是, 或者否, 在 C语言 中, 我们用了一种很简单的方式来表示:

  • 数值为0: 否 ( false false ),
  • 否则: 是 ( true true ).

很简单对不对.

2.1.1.8.6.4 Void Type

以上的类型, 都还很具体, 不过当我们需要表示 "这里没有东西" 呢? 该怎么办?

这时候我们就需要用到 void void 类型了. 不过这里不解释太多, 我们将会在应用中见证它的使用.

2.1.1.8.7 Mathematics Operations

有了数字, 并不能让我们进行计算, 我们还需要定义对于这些数字的运算才可以.

所以首先, 对于所有的数值, 不管是整型数家族的, 还是浮点数家族的, 都适用于我们熟悉的四则运算, + + , - - , * * , '/'.

Operations Description Form Comment
+ + 两数相加, 并返回新的相加后的值 A + B A + B
- - 从前数中减去后数, 并返回新的相减后的值 A - B A - B
* * 两数相乘, 并返回新的乘积 A * B A * B
/ / 前数除以后数, 并返回除商 A / B A / B

当然了, 由于取余数的操作太有用了, 实际上 C语言 也为整数和浮点数的取余操作定义了两个方式, 并将这种运算称作 "取模":

Operations Description Form Comment
% % 取模 A % B A % B
fmod fmod 浮点数取模 fmod(A, B) fmod(A, B) 该方法为函数调用, 仅对 double double 类型浮点数生效
fmodf fmodf 浮点数取模 fmodf(A, B) fmodf(A, B) 该方法为函数调用, 对 float float 类型浮点数生效
fmodl fmodl 浮点数取模 fmodl(A, B) fmodl(A, B) 该方法为函数调用, 对 long double long double 类型浮点数生效

下面则是c语言中, 整型变量特有的四种运算符, 它们被称作 "自增/自减运算符"

Operations Description Form Comment
++ ++ 自增 A++ A++ 先将原始值返回, 再将变量值增加1
++ ++ 自增 ++A ++A 先将变量值增加1, 再返回增加后的值
-- -- 自减 A-- A-- 先将原始值返回, 再将变量的值减少1
-- -- 自减 --A --A 先将变量的值减少1, 再返回减少后的值

大家可以发现, 自增和自减运算符都是有一定的规律的, 如果运算符的位置在变量的前面, 那么就是先对变量进行操作, 然后再取值, 而如果运算符的位置在变量的后面, 则先取值, 等到值参与完运算以后再给变量自增或自减.

int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);
int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);

同样的, 大家也可以看到, 这里对于运算符的描述并不是对数值生效了, 而是对 "变量" 生效. 那么变量是什么东西呢? 正如之前已经提到过的, 变量是一种用来存储数值的东西, 那么既然变量可以存储数值, 并且也可以参与运算, 所以我们就也自然会有一些对于变量本身存储的数值进行操作的运算符, 除了这里讲到的自增自减运算符, 其实还有其他的, 比如赋值运算符.

2.1.1.8.7.1 Relation Operations

除了数值运算, 实际上我们也可以对这些数值进行比较, 在 C语言中, 这些用来比较不同数值之间大小关系的运算符, 被称作 "关系运算符".

关系运算符对于所有的数值都生效, 而对于字符串, 由于字符串的比较也非常常用, 因此, 字符串比较的函数也是被纳入到了标准函数库中. 不知道大家是否还记得前面提到的, 什么是 "库". 库, 就是一种由其他人写出来, 而不是由C语言本身提供, 定义了一系列有用的函数以供导入的东西.

好吧, 扯远了, 一下就是所有常用的关系运算符 (和函数):

Operations Description Form Comment
== == 相等关系 A==B A==B 若A等于B, 则返回1
!= != 不等关系 A!=B A!=B 若A不等于B, 则返回1
> > 大于关系 A>B A>B 若A大于B, 则返回1
< < 小于关系 A<B A<B 若A小于B, 则返回1
>= >= 大于等于 A>=B A>=B 若A大于等于B, 则返回1
<= <= 小于等于 A<=B A<=B 若A小于等于B, 则返回1
strcmp strcmp 字符串比较 strcmp(A, B) strcmp(A, B) 若两字符串相等, 返回0, 否则返回按字典序相减值
memcmp memcmp 内存比较 memcmp(A, B) memcmp(A, B) 返回两内存空间相减二进制值

不过, 必须要注意的一点是, C语言中不存在连续不等式, 也就是说, C语言中是没有办法写出类似 𝐴>𝐵>𝐶 的这种表达式的.

那么, 如果真的不小心写出了这样的代码, 会发生什么事情呢? 比如说 1 < a < 10 1 < a < 10 .

实际上, 这种表达式会被C语言认为是一种连续运算的表达式. 也就是, 前面一个表达式运算完成, 然后再让结果参与下一个表达式的运算, 而这种连续运算, 是存在优先级关系的, 就像数学中, 同时包含加减和乘除的算式中, 永远都是乘除先参与运算一样.

那么, 对于上面的表达式, 就是先进行 1 < a 1 < a 的运算, 再把结果, 不论是1, 或是0, 交给后面与10的比较. 这样就会导致, 这个表达式的结果, 一定只是1.

因此, 一定要注意, 不要写出 "连续不等式" 哦.

2.1.1.8.7.2 Logical Operations

逻辑运算, 也是C语言经常需要进行的运算, 那么什么是逻辑运算呢?

实际上, 逻辑运算就是能够把多个逻辑值串成一串, 确定最后到底结果是真是假的运算.

就比如, 刚刚才提到的, C语言中并没有连续不等式, 那么该怎么样表示连续不等关系呢? 这里就需要用到逻辑运算了.

逻辑运算主要包含了, 或, 与, 非, 三种运算:

Operations Description Form Comment
&& && 逻辑与 A&&B A&&B 若A和B都非0, 则返回1
|| || 逻辑或 A||B A||B 若A和B有至少一个非0, 则返回1
! ! 逻辑非 !A !A 若为0, 则返回1; 若非0, 则返回0

从这里, 也可以看出来, 逻辑与或非和逻辑门运算还是非常不同的. 所以后面, 将会单独对按位逻辑运算进行详细介绍…

回到如何表示连续不等关系, 只要这样写即可

1 < a && a < 10
1 < a && a < 10

值得注意的是, 逻辑运算符, 都是 "短路" 的. 这是什么意思呢? 就是说, 如果逻辑运算符的左边结果, 已经可以决定逻辑运算符整体结果, 那么逻辑运算的右半部分就不会被执行, 而是直接将逻辑运算的结果返回出来.

2.1.1.8.7.3 Associativity

正如上面提到的, 运算符结合性决定了连续运算的表达式的执行顺序, 那么, 具体的规则如何呢?

在下表中, 自上而下, 与对应操作相关的表达式被更先进行, 由左而右, 结合性依次减小

Operations Description Comment
() [] -> . ++ -- () [] -> . ++ -- 后缀 从左到右
+ - ! ~ ++ - - (type)* & sizeof + - ! ~ ++ - - (type)* & sizeof 一元 从右到左
~ ~ 按位取反 从左到右
* / % * / % 乘除 从左到右
+ - + - 加减 从左到右
<< >> << >> 移位 从左到右
< > <= >= < > <= >= 比较关系 从左到右
== != == != 相等关系 从左到右
& & 按位与 从左到右
^ ^ 按位异或 从左到右
| | 按位或 从左到右
&& && 逻辑与 从左到右
|| || 逻辑或 从左到右
? : ? : 三目运算 从右到左
= += -= *= /= %= >>= <<= &= ^= |= = += -= *= /= %= >>= <<= &= ^= |= 赋值 从右到左
, , 逗号 从左到右

很复杂对不对, 但是没有关系, 其实, 当你不确定运算符优先级究竟是如何的, 可以直接将自己希望的运算顺序用括号括出来, 表示它们需要优先进行. 其他的部分, 也是非常符合数学中的直观感受的.

大家也许会发现, 除了我们已经讲过的一些基本数值运算, 这张表中还有一些从未见过的其他运算符,

仔细观察的话, 除了逻辑与和逻辑或, 在这张表中还有按位与或, 异或, 和取反. 很快, 我们将开始了解它们.

PS. 另一个比较重要的则是赋值运算符家族, 将在重新完整介绍完C语言的语法后介绍.

2.1.1.8.7.4 Binary Calculation

现在, 就需要一些简单的数学了: 二进制运算.

首先, 什么是二进制运算呢, 实际上, 二进制运算是针对二进制数的运算, 虽然这话听起来好像是废话, 但是它实际上 也是废话 却有很多含义.

首先, 它表示了它操作的对象是二进制数, 也就是运算规则为逢二进一的数.

二进制的基数为2, 每一位的数字, 只可能是0或1.

二进制数有一些特别的特性, 其中最显著的优势在于, 它的每一位只有两种状态, 这正好和电路的开关相一致. 这样就方便了计算机的工作. 另外一些特性是, 二进制数可以方便的和十六进制与八进制相互转换, 虽然这些实际上是十六进制和八进制的优势, 因为它们基数均为二的次方.

2.1.1.8.7.5 Radix Convert

二进制对于计算机友好, 但是对于人类来说却有些难办了. 因为我们常年都在和十进制打交道.

那么这就需要处理各种 "进制转换" 问题.

二进制和十进制, 同样都表示了同样的数集中的数, 因此它们可以以一定规则互相转换.

二进制转换为十进制, 实际上就是依照每一位, 乘以对应的二的次方. 也许听起来会有些复杂, 但是操作起来非常简单: 如: 我们有二进制数 1011, 那么它的十进制就是:

(1011)(2)=1×23+0×22+1×21+1×20=(11)(10)

二进制转换为十进制也是类似的, 就是不断将十进制数除二取余数即可:

112=5152=2122=1012=01

最后将余数从下向上写出即可得到对应二进制数.

上文提到, 二进制和十六进制, 八进制的互相转换非常方便, 那么, 它具体方便到什么程度呢? 对于二进制转十六进制, 只要按四位一组, 高位不足补0, 直接换成十六进制就行. 八进制也类似, 按三位一组, 高位不足补0, 替换成为八进制.

继续以 1011 举例:

(1011)(2)=(𝐵)(16),(1011)(2)=(001011)(2)=(13)(8).

反向操作也极其一致, 非常方便.

2.1.1.8.7.6 Bitwise Operations

二进制, 除了常规的十进制运算, 其实也提供了一些特别的运算能力, 在C语言中的表现就是, 按位运算.

在计算机中, 门电路一种可以提供 与门(AND), 或门(OR), 非门(NOT), 与非门(NAND), 或非门(NOR), 异或门(XOR), 同或门(XNOR), 这几种逻辑门.

它们的运算逻辑可以以下表表示:

Operations Description Form A B Result
AND AND A AND B A AND B 1010 1100 1000
OR OR A OR B A OR B 1010 1100 1110
XOR XOR 异或 A XOR B A XOR B 1010 1100 0110
NAND NAND 与非 A NAND B A NAND B 1010 1100 0111
NOR NOR 或非 A NOR B A NOR B 1010 1100 0001
XNOR XNOR 同或 A XNOR B A XNOR B 1010 1100 1001
NOT NOT NOT A NOT A 1010 - 0101

实际上, 它们的规则也非常简单:

  • 与门当且仅当两个输入均为1时才输出1, 否则输出0;
  • 或门只要有一个输入为1就输出1, 否则输出0;
  • 非门将输入取反, 原输入为1, 输出0, 否则输出1;
  • 与非门实际上是与门取反, 只在输入不存在, 或有一个1的时候才输出1, 否则0;
  • 或非门则是或门取反, 当均为0时才输出1, 否则输出0;
  • 异或门的重点在于 "异", 当两个输入相反时, 输出1, 否则输出0;
  • 同或则是异或取反, 当输入均相同时, 输出1, 否则输出0.

因此, 实际上, 一切包含非的门电路, 均可以来自于与, 或, 取反, 而其他所有门电路, 则均可以通过NAND门取得.

计算机底层的实现中, 有逻辑门运算, 而C语言中, 也有对应的按位运算. 按位运算是门运算对于多位二进制数的运算, 一共有四种:

Operations Description Form Comment
& & 按位与 A&B A&B 若A和B对应位都非0, 则对应位置1
| | 按位或 A|B A|B 若A和B对应位有至少一个非0, 则对应位置1
^ ^ 按位异或 A^B A^B 若A和B对应位有且仅有一个非0, 则对应位置1; 否则, 则对应位置0; 不同为1, 相同为0
~ ~ 按位取反 ~A ~A 每一位若为0, 则置1; 若非0, 则置0
2.1.1.8.7.7 Overflow

计算机操作的虽然是二进制数, 但是它的容量却是有限的, 而不能像数学中可以表示理想的无限大整数.

因此, 当数的大小超出了计算机可以表示的范围, 就发生了 "溢出". 在大多数的计算机中, 当发生了溢出, 溢出位会被抛弃, 而只给出一个是否曾发生了溢出的标记.

绝大多数时候, 我们会选择尽可能的避免溢出的发生, 因为它会导致运算结果不符合预期. 因此, 当定义变量的时候, 需要提前估算数据的范围, 为不同的数据选用不同的类型.

但是溢出并不总是坏事, 有时候, 它可以给我们带来一些特殊的优势. 比如著名的 "雷神之锤 III" 平方根倒数速算法, 就为是利用了溢出和微积分线性拟合的典例.

而我们计算机中, 对于负数的表示, 也和溢出有千丝万缕的联系.

2.1.1.8.7.8 2's Completion

计算机可以表示的数据是有限的, 最开始, 一块 CPU 只能计算8位二进制数, 那非常小, 只能表示 0255 之间的数据. 后来, 直到现在, 计算机也只能表示64位的数据. 当我们只考虑正数的时候, 它并不会出现很大的问题, 在整数范围内, 直接相加即可得到所需的结果. 即便是两数相加发生溢出了, 也可以相对简单的解决.

但是, 当需要考虑负数的时候, 情况就开始不一样起来了. 我们开始必须找到一种方式, 来区分一个数是正数还是负数.

最朴素的想法是, 我们舍弃一位的表示范围, 将这一位用于区分数的正负性. 于是, 我们就有了 "整数的原码表示" (Origin).

在我们需要表示的数值为正时, 原码与真值 (True Value) 相同. 而当需要表示负数的时候, 最高位会被写作1. 也就是说, 将最高位作为符号位, 记录数据是正还是负.

原码表示在数学运算中会导致非常大的问题, 因为, 负数参与运算时, 最高位为1, 与正数进行二进制加法, 可能会得到不正确的结果 — 一个更大的负数.

    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)
    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)

所以, 对于一个涉及到负数的运算, 不能直接采用通常的二进制原码表示, 简单的将负数的最高位置为1.

理想的负数表示, 需要保证运算完成后, 可以使得负数与对应正数相加值位0 (最高位产生1位溢出).

于是, 为了达成这样的结果, 我们选择将数值部分原样取反 这样就得到了 "反码" (1's Completion).

但是反码有同样的问题, 虽然可以避免正负数相加得到更大的负数, 但是一个正数, 和对应的负数相加, 得到的却不是原始的0, 而是全1, 这就会造成 +00 的问题.

    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)
    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)

于是, 既然相等负数相加不为0, 那么干脆给它补一个1, 将反码运算中的结果加上一个1, 再经过溢出处理, 最后的结果就是我们想要的真正的0.

为了实用, 将这个1, 加入到反码表示中. 于是, 我们就得到了 "补码" (2's Completion).

当然, 这是实践可以得出的结论, 补码实际上有它更深层次的意义.

2.1.1.8.7.9 N's Completion

N的补码, 实际上是模N剩余类加群, 对于

𝑍𝑛=𝑍mod𝑛(𝑍,mod)

, 满足封闭性, 结合性, 则有Z上的模N剩余群.

给定一个n, 有n个模n剩余类, 且有 a, b 满足 gcd(𝑛,𝑎)=1,𝑎×𝑟𝑖+𝑏, 构成模n完全剩余系.

对于𝑛𝑛, 有𝑏=𝑛𝑎𝑎+𝑏=0, 若定义 𝑎𝑛1, 存在负数与对应正数模n同余, 则n为互补常量.

𝑎=𝑎的加法逆元, 则, 对 𝑀 求补有 𝑎=𝑀𝑎,𝑀=10𝑛, 对于 M M 0=𝑀,0=0, 在 𝑀2 上同余.

2.1.1.8.7.10 Bitwise Shift

Apart from regular bitwise operations, we have some special ones as well. Could you image that every digit of a numbers can be shift?

We have mentioned float point numbers before already, right? You may think that float point can be seen as shift of digits. But actually, the float point numbers just move the position of decimal point.

In bitwise shift operations, the decimal point will be fixed in #0. #0. . And, move all digits directly right or left.

  • Logical Shift Right: Shift all digits right based on 0 position. Every number outside 0 will be discarded. Padding higher position with 0.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
  • Mathematical Shift Right: Mostly same as logical shift right operation, but padding higher position based on sign bit.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...

    For positive numbers, exactly like logical ones.

     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...
     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...

    For negative ones, padding number will be 1 instead.

  • Shift Left: Shift all digits left based on highest position. Every number over highest limit will be discarded. Padding 0 position with 0.

       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
Operations Description Form Comment
<< << SHL A << B A << B
>> >> SHR A >> B A >> B Different machine may choose different SHR method, Logical or mathematical

Give a brief knowledge of bitwise shift operations here. You may find that, shift operations just do multiplication and division indeed.

How?

Actually, SHL SHL are some number multiple 2𝑛. SHR SHR are some number division 2𝑛.

And all discarded numbers are seen as overflow.

2.1.1.8.8 Syntax

C语言, 实际上, 作为一种和计算机进行沟通交流的语言, 实际上也有自己的一套语法规范.

在前面几节中, 我们也看到了, 如果没有按照它的语法规范来书写, 就会遇见 "非法" 报错.

因此, 我们有必要系统了解一下C语言的各种语法规范.

以下是我们的示例程序:

/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}
/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}

From the program above, we can see that there are several lines that contains something we haven't met before.

We all explain them all in this chapter.

2.1.1.8.8.1 Statements

The first thing I'd like to tell you is definition for statement.

The c program are composed with statements, just as what we have mentioned before.

Statements define the operation the program will execute. Each statement may have do something.

According to the C Programming Language Standard, every statement in c need to end with semi-colon (';'). Unless it is listed detailed that has no necessary to have semi-colon.

For example, we can see,

  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;
  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;

they all statements.

Also, multiple statements can be written in same line. You may see this:

int i; i = 1;
int i; i = 1;

From here, we written two statements, int i; int i; , and i = 1; i = 1;

So, it is not necessary to add line feed between two different statements.

They are added for beauty and clear.

Also, because the statement termination will just be determined by semi-colon, one statement may be written in multiple lines.

int
i
=
10
;
int
i
=
10
;

They are legal as well.

But, we'll not write code in this way. More common usage of this feature will be:

int i = 10,
    j = 20;
int i = 10,
    j = 20;
2.1.1.8.8.2 Expression

As we have known statement, another import part of c program is expression.

From which, a expression is some form that contains different operation.

Most basic expression we'd used in program are calculation.

1 + 2
i = 0
printf("Hello, World")
1 + 2
i = 0
printf("Hello, World")

They all expressions, and finally get the result of those operation.

Statements may contains expression, but expression cannot construct a statement.

Also, most of the time, a expression will generate some value, that can be used in the following program.

Furthermore, expression is able to be nested.

printf("%d", 1+1)
printf("%d", 1+1)

Here, we have two expression, the smaller one 1+1 1+1 , and the larger one, which wraps the small one, printf("%d", ~) printf("%d", ~) .

Once we add semi-colon after them, the whole expression will be a statement.

printf("%d", 1+1);
printf("%d", 1+1);

And is ready to do something particular.

You may image, as the function call is a valid expression, and can be turned into statement. The calculations, we can also add semi-colon after them, to have a statement.

1;
8*2;
1;
8*2;

But they are meaningless.

2.1.1.8.8.3 Code Block

When we programming, sometimes we may want to execute some operation at same time (or intend to execute them at same time).

Then, we need Code Blocks, or "compounded statements". They are Statements composed and wrapped in one large brackets. For example:

{
  int x;
  x = 1;
}
{
  int x;
  x = 1;
}

They are seen as a group, one large statement later on the rest of program.

And we need no semi-colon at the end of bracket expression.

2.1.1.8.8.4 Empty Lines & Space

Not only for beauty, we'll need spaces in code for distinct different syntax object.

For example, why we always need a space between int int and i i ? Because if we dropped it, the compiler will only see inti inti , which is not a valid name, or anything else.

Just like the reason why we must write space between different words. (Even in Chinese).

So, at some particular times, if we can say that, the space will not change the structure of our code, the space is able to be deleted.

Empty lines, the line which contains no code, does relative same as space. If it is not necessarily placed there, then it does only for beauty, and can be removed.

The example here points out, when can we discard the space and empty lines.

int x = 1;
// Equals to
int x=1;
int x = 1;
// Equals to
int x=1;
2.1.1.8.8.5 Comment

Comments are another thing that will not affect anything within our code. When compiler meets a comment, it will ignore it directly. Which means, comment will behaviour like a space in our code.

There are two ways for us to write comments.

  • /* ... */ /* ... */ : multiple line comment, but also for inline comment, anything inside /* /* and */ */ will be ignored.
  • // ... // ... : one-line comment, anything follow after will be ignored.

We can see the code above, to have a relative simple understand to comments.

2.1.1.8.9 Variables & Variable space

Here, we comes to the most import part of a program. We'll know what variable is, how it is defined, and operations done on them.

First of all, we'd like to see, relation between variable and value.

2.1.1.8.9.1 Data, Variable, Value

Data, something that represents something, carrying some information, always the object we will manipulate in program.

But how can we describe a data? We may use something called "variable", they are some slot that has desired space for storing data.

Thus, in general, variable are some space, slot, that can store some value, carrying some specified data.

2.1.1.8.9.2 Definition

Before we use some concrete variable in our program. We must define them.

The basic forms of variable definition are list below:

<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];
<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];

Also, we have another way to declare a variable:

extern <variable-type> <variable-name>;
extern <variable-type> <variable-name>;

From them all, we can see that, to declare a variable. We'd have to write in "type name;" form.

Where, type can be any type specifier mentioned above in types section.

Such that,

int a;
int b;
int a;
int b;

Furthermore, when we have learnt the structure, enumerator, union and function, we all have more form of types.

2.1.1.8.9.3 Variable Name

One must-have element of variable definition is type. And another one is variable name.

Once we have define a variable, we can then reference it using its name.

Just like you call one's name.

Variable names in c programming language must follow some rules:

  1. start with '$', '_' and alphabet,
  2. have no space inside,
  3. followed by '$', '_', alphabet, and numbers.
  4. has a total length less than 63 character.
  5. not duplicate with any other names defined before or same with keywords like 'int'.

Keywords, are some commands will reserve for special usage in c program, for example, int int , if if , continue continue . And C programming language also have some name reserved for further usage. So, for those name, although it is possible to be use, it is not encouraged to do so.

Here are some mainly used keywords and reserved names:

auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic
auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic

Outside those keywords that cannot use, we also have extra naming rules.

Names starts with two underscore ('_') and those start with one underscore and a capitalized alphabet are reserved for compiler.

Names starts with two underscore and ends with two underscore are reserved for system-wide standard library.

Names starts with one underscore and a lower-case alphabet, ends with one underscore are reserved for library.

Names all capitalized alphabet, split by underscore, meaning constants.

2.1.1.8.9.4 Initialize

Once you finished declaration, which doesn't means you finished the variable definition.

A variable must do initialize, and then can be put into use. Otherwise, you may get random value when you try to reference it.

First time assignment to a variable are called "initialization".

Only for that, with variable declaration and initialization, we can say we finished a variable definition.

From list above, we can see that initialization can be done together with declaration.

int a = 10;
int a = 10;
2.1.1.8.9.5 Assignment Operations

Assignment are some operation special to variable.

Most simple one has notation like equation equation in math. We call it assignment operation assignment operation directly.

Operations Description Form
= = Assignment A = val A = val

After program finish a assignment operation, it value store within variable will be replaced.

int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9
int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9

So, this is the meaning of "variable", a space that can store some value. And assignment operation just find those space, and then replace the value inside. Just like the drawer that can store exactly one thing. You may put one thing inside. And you may clear the drawer, and put a new one inside.

2.1.1.8.9.6 Composed Assignment Operations

Beyond regular assignment operation, we have some advanced ones. You may compose assignment operation with other mathematics operations. Thus, we got compound assignment operation compound assignment operation .

Operations Description Form Equivalent Form
+= += Addition Assignment A += val A += val A = (typeof(A))(A + val) A = (typeof(A))(A + val)
-= -= Subtraction Assignment A -= val A -= val A = (typeof(A))(A - val) A = (typeof(A))(A - val)
*= *= Multiplication Assignment A *= val A *= val A = (typeof(A))(A * val) A = (typeof(A))(A * val)
/= /= Division Assignment A /= val A /= val A = (typeof(A))(A / val) A = (typeof(A))(A / val)
%= %= Modulus Assignment A %= val A %= val A = (typeof(A))(A % val) A = (typeof(A))(A % val)
^= ^= Bitwise XOR Assignment A ^= val A ^= val A = (typeof(A))(A ^ val) A = (typeof(A))(A ^ val)
|= |= Bitwise OR Assignment A |= val A |= val A = (typeof(A))(A | val) A = (typeof(A))(A | val)
&= &= Bitwise AND Assignment A &= val A &= val A = (typeof(A))(A & val) A = (typeof(A))(A & val)
<<= <<= SHL Assignment A <<= val A <<= val A = (typeof(A))(A << val) A = (typeof(A))(A << val)
>>= >>= SHR Assignment A >>= val A >>= val A = (typeof(A))(A >> val) A = (typeof(A))(A >> val)

Those self-increment operation and self-decrease operations are some kind of same as addition assignment and subtraction assignment:

int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
2.1.1.8.10 Type Conversion

As we mentioned before, C is typed language. Each type's variable occupies different spaces.

So, to have one variable has type int int , to be used as long long , we must convert its value into type long. The way to archive this is called type convert.

In types section, we have learnt type boost type boost , this is a kind of special automatically type conversion. Auto type conversion always convert type from smaller ranges to larger. So, that's why we need force type conversion.

To convert a value's type from one to another, add type with brackets before the expression.

(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;
(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;

But force type conversion has a serious problem: it may result in resolution lack. Conversion from int int to char char , is a kind of conversion from large range to smaller range. And it will simply discard higher part of int int value. Instead of the case short short convert to int int , just put all data into lower part of int and everything is OK.

For example,

  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011
  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011

This may cause some unexpected results.

Also, conversion from real numbers to integer will also introduce same problem. All number after decimal point will be dropped directly.

2.1.1.8.11 Input And Output

Programs does not only calculation, but also have to tell the result. Thus input and output utilities are indispensable.

Most useful input and output function are provided by printf printf and scanf scanf function in C.

2.1.1.8.11.1 printf printf

printf printf , stand for "print with format", a kind of format output method.

So, basically, the function of printf printf is to display some information on screen. And advanced functions are format output string.

2.1.1.8.11.1.1 Output

Most basic usage of printf printf is written as following:

printf("output string")
printf("output string")

Anything inside quotations, the string delimiter, except '%', will be displayed as is.

For example, the printf printf here will print "output string" to terminal. The black-backgrounded window on your computer.

For "terminal", the name came from the hardware long long ago.

One thing you must noticed is that, example shown here is just a expression, but a statement. So, in order to make it work, you may have to add a semi-colon, ';', after whole expression.

In most case, the system will refresh output with carriage return, line feed, or both. But printf printf will never add any of which after all content have been printed. So, to let output looks normal, you need to add a new line mark at the end of string:

printf("string with new line mark at end\n")
printf("string with new line mark at end\n")

Outside end of line, new line mark can also added inside a sentence.

printf("string\nwith new line mark inside\n")
printf("string\nwith new line mark inside\n")

This may do the same as following:

printf("string\n");
printf("with new line mark inside\n");
printf("string\n");
printf("with new line mark inside\n");

(why we add semi-colon at the end of sentence? Because you will never able to written two different expression within one statement in such form)

2.1.1.8.11.1.2 Placeholder & format

And how about advanced functions?

The format feature is provided by placeholders. Have you ever remember I have mentioned '%' before? Percentage mark works like placeholder here, and that's why it cannot be printed directly using printf printf . The method to print out '%' into screen is done by writing '%' as "%%" in format string, the first argument provided for printf printf .

Since printf printf has the name "print with format", the placeholder must have not only the function to prevent percentage mark to be evaluated and printed. So, let us investigate more about placeholders.

As we all know, C programming language has classified data into different types. So that placeholders must have different form so that printf printf function can then distinct them. Those decorator for placeholders are called "type specifier". And a full placeholder are written according to such syntax:

<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>
<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>

Looks complex? Just quick glance and move forward, examples says more than standard:

type specifier Description Form Expected Data
a a , A A Output floats in hexadecimal %a %a Reals: float, double, double
d d Output integer in decimal %d %d Integers: char, short, int
o o Output integer in octal %o %o Integers: char, short, int
x x , X X Output integer in hexadecimal %x %x Integers: char, short, int
u u Output unsigned in octal %u %u Unsigned Integers: unsigned char, short, int
f f Output reals in decimal %f %f Reals: float
e e , E E Output reals in exponent %e %e Reals: float
g g , G G Output reals in shorter form %g %g Reals: float
c c Output Character %g %g Character: char
s s Output Character String %s %s String: char[] char[]
p p Output Address %p %p Pointer: * *

And their long version variants:

type specifier Description Form Expected Data
ld ld Output integer in decimal %ld %ld Integers: long
lo lo Output integer in octal %lo %lo Integers: long
lx lx , lX lX Output integer in hexadecimal %lx %lx Integers: long
lu lu Output unsigned in octal %lu %lu Unsigned Integers: unsigned long
lld lld Output integer in decimal %lld %lld Integers: long long
llo llo Output integer in octal %llo %llo Integers: long long
llx llx , llX llX Output integer in hexadecimal %llx %llx Integers: long long
llu llu Output unsigned long long in octal %llu %llu Unsigned Integers: unsigned long long
lf lf Output reals in decimal %lf %lf Reals: double
le le , lE lE Output reals in exponent %le %le Reals: double
lg lg , lG lG Output reals in shorter form %lg %lg Reals: double
% % Output % % %% %% None

Here are flags part:

flags Description Form Expected Data
- - Align left, default right %-d %-d None
+ + Force output '+', default not show for positive %+d %+d None
Insert a space before output % d % d None
# # Show '0', '0x' or '0X' with 'o', 'x', 'X' descriptor
force show decimal point with 'e', 'E', 'f'
or, not remove tailed zero with 'g', 'G'
%#d %#d None
0 0 Padding 0 instead of space %0d %0d None

Width, .precision and length:

flags Description Form Expected Data
(number) (number) minimal number of character to print, padding with space, if output longer than this value, output will not be truncated %8d %8d None
* * width not specified in format string, but obtained as parameter before argument to be formatted %*d %*d Integer: char, short, int
.number .number for integers (d, i, o, u, x, X): minimal digits to be written, less than this value will padding by 0. Longer than this value will affect nothing. 0 means nothing to print
for e, E, f: digits after decimal point
for g, G: maximal digits to be printed
s: maximal length of a sting, default, all character will be printed, until '0'
c: nothing affected
nothing placed will introduce a 1
%.10d %.f %.10d %.f None
.* .* precision not specified, but obtained as parameter before argument to be formatted %.10d %.f %.10d %.f Integer: char, short, int
h h parameter as short, for i, d, o, u, x, X %hd %hd None
l l parameter as long, for i, d, o, u, x, X
double, for f
wide char, for c
wchar string, for s
%ld %ld None
ll ll parameter as long long, for i, d, o, u, x, X
long double, for e, E, f, g, G
%lld %lld None
L L parameter as long long, for e, E, f, g, G
parameter as long long, for i, d, o, u, x, X
%Lf %Lf None

And prinf prinf will return total character it printed.

You may able to print ASCII code using printf printf now:

#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}
#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}

Definition of printf printf function is written as:

int printf(const char * fmt, ...);
int printf(const char * fmt, ...);

So, you can call it using the form:

printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
2.1.1.8.11.2 scanf scanf

Once we learnt output part, it is also necessary to have a glance to input part.

The usage of scanf scanf is roughly like to printf printf , except function calling methods. Scanf Scanf stands for "Scan from format", so, it necessarily needs placeholder as printf printf .

Placeholders are written in this form:

<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>
<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>

Some kind of like to printf printf , right?

part Description Form Expected Data
* * * stand for discard input, or, simply skip data match the type %*d %*d None
width maximum character to be read %8d %8d None
modifiers decorator for type specifier like printf printf %ld %ld None
type data to be scan as %d %d None
part Description Form Expected Data
a a , A A floats scanf("%a", &f) scanf("%a", &f) floats
c c characters, if width is not 0, read width character and set to parameter scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) char
d d integer written in decimal, '+' or '-' are optional scanf("%d", &i) scanf("%d", &i) int
ld ld integer written in decimal, '+' or '-' are optional scanf("%ld", &l) scanf("%ld", &l) long
lld lld integer written in decimal, '+' or '-' are optional scanf("%lld", &ll) scanf("%lld", &ll) long long
e e , E E , f f , F F , g g , G G real numbers, '+' or '-' are optional, 'e' for exponent are optional scanf("%f", &f) scanf("%f", &f) float
i i integer scanf("%i", &i) scanf("%i", &i) int
o o integer written octal scanf("%o", &i) scanf("%o", &i) int
s s string, separated by blanks scanf("%s", s) scanf("%s", s) char[] char[]
u u unsigned int scanf("%u", &u) scanf("%u", &u) unsigned int
x x , X X int written in hexadecimal scanf("%x", &i) scanf("%x", &i) int
p p pointer scanf("%p", &p) scanf("%p", &p) * *
[] [] ranges, simplified regular expression scanf("%[1-9]", &c) scanf("%[1-9]", &c) char
% % % % scanf("%%") scanf("%%") None

Sample question: A+B Problem:

#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
2.1.1.8.12 Conditional Statement

Since the program is not only tool to calculating, it also helps people to solve problems require decision.

So, scientists introduces conditional statement. They can decide what to do according to conditions.

2.1.1.8.12.1 If

If statement has form of:

if (condition) statement
if (condition) statement

When condition expression part evaluated with true, then statement part will be executed.

if (x < y)
  printf("x less than y");
if (x < y)
  printf("x less than y");

You can see, x < y x < y is condition expression, and if x indeed less than y, the program will output the information.

But this is only the simplest case, what if we want to execute multiple statement within if statement?

Remember code block? Code block can compose different statements together. So:

if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}
if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}

Here, we execute two statements when x larger than current max value.

2.1.1.8.12.2 If-Else

Instead of just "if" statement, sometimes we may need "else" part.

if (condition)
  then-statement
else
  else-statement
if (condition)
  then-statement
else
  else-statement

Just similar to if statements, when condition is not 0, or, acceptable, execute then-statement, else, execute else-statement.

Also, you may find some case, you may classify different case, so you can written then like this:

if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement
if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement

This is simply nested if-else statements for each "else if" are new if statement place in else part of further one. This is for beauty, but you can also write like this:

if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}
if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}

Very clear.

2.1.1.8.12.3 Ternary if-else operator

三元运算符

Though in most case, if-else statements is enough, it is still the statement but a expression. Thus in some corner condition, written using if-else may result in more lines of code and complexity.

Thus we introduces ternary if-else operator. With this operator, you got a expression, so you can than combine them together with other expressions.

Ternary if-else looks like this

condition ? then : else
condition ? then : else

when condition is true, then part will be executed, and if condition is false, else part will be evaluated. And finally, the value of expression will be return.

So, you may write:

int i = 10;
i = i - 100 < 0 ? 0 : i - 100;
int i = 10;
i = i - 100 < 0 ? 0 : i - 100;

or, in c++, you may found you can write like this: (we must mention c++ here for clear because this style of ternary is indeed not allowed to be written in pure c, but most of programmers may not distinct c/c++)

int i = 0;
int j = 10;
(i < j ? i : j) = 1;
int i = 0;
int j = 10;
(i < j ? i : j) = 1;

(the second case is correct because every operation in c++ are special methods(functions), so = is actually a function call, equivalent style is int::operator=(i< j ? i : j, 1); int::operator=(i< j ? i : j, 1); )

They all correct, but second one is not encouraged to use.

2.1.1.8.12.4 Switch-Case

Addition to if-else statement, we also have switch-case statements.

switch (object) {
  case label:
    statements
  case label:
  ...
}
switch (object) {
  case label:
    statements
  case label:
  ...
}

Label can be one of "case literal-value" or "default", and it is not necessary to add brackets if you have multiple statements in one case. Each label means an entry, when object matches label, it will execute start from the position of label, until meets break statements break statements

Then, a legal switch-case statements may look like:

int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
2.1.1.8.12.4.1 Break statement

But what does break statement do?

Break statements has two variants. One is here, break statements used to jump out of the switch case statements' execution sequence.

When c finds object matches the label, and it will execute each statements after the label until meets end bracket, but in some case, actually, most case, you may not want it to do so. So, break can break whole process, when it executed break statements, it will simply jump out of switch-case statements, and rest statements inside will not be executed.

Though break statements in switch-case is not mandatory, but it is a good habit to add break for each label.

2.1.1.8.13 Loop

What if you want to execute multiple, same, or equivalent same statements? Here we needs loop.

Loop are some statements can execute other statements repeatedly according to some condition.

2.1.1.8.13.1 While

While loop looks similar to if statement,

while (condition)
  loop-body
while (condition)
  loop-body

and works similar to if statement as well. When condition is true, then loop-body will be executed.

Furthermore, most similar part between while loop and if statement is that body of loop has still single statement. If you want multiple statements to be evaluated, you must add brackets.

while (1) {
  printf("infinity loop\n");
}
while (1) {
  printf("infinity loop\n");
}
2.1.1.8.13.2 For

For loop is another type of loop, it may not that clear to have the name "for",

for (initial; condition; update)
  loop-body
for (initial; condition; update)
  loop-body

for loop always have four part.

Initial part give the ability to define loop variable and initialize them inside the loop. Condition part is same as while loop, if it is true, then body executed, else, just break the process. Loop-body, still, same as if and while loop, execute if everything OK. And finally, update, when loop-body finished, the for loop will do update, to update loop variable.

for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}
for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}

Another important part is that, for totally four part of for loop, initial initial , condition condition , and update update parts can be empty. Thus, you may find in some special case,

for (;;)
  body
for (;;)
  body

can be seen as infinity loop.

2.1.1.8.13.3 Do-While

But what if we need to execute body at least once?

Then we need do-while loop.

do {
  body
} while (condition);
do {
  body
} while (condition);

Apart form other statements, do-while loop requires brackets compulsory.

2.1.1.8.13.4 Break

Still break, the other form of break is here, when break statement used within the body of loops, it will jump out of whole loop. Discard anything after break. Even update part of for loop.

Similar to switch-case.

2.1.1.8.13.5 Continue

Sometimes, you may need to just skip rest of part in body, but not jump out of loop, then you needs continue statement.

When continue executed, it will just go to another round of loop, do update, test condition, and new execution process of body.

2.1.1.8.14 Array

When we are dealing with small scale of data, define multiple variables is enough, but how about sequence of data?

For example, read scores of over 500 students and sort them.

In contrast, average and maximum can be done with only one or two variables, but this requires store all information.

Arrays are linear and continuous data structure for storing same type values.

Definition for one-dimension array written as following:

type name[length];
type name[length];

And further, array can be multiple-dimension.

type name[length][length];
type name[length][length][length];
...
type name[length][length];
type name[length][length][length];
...

Once we define an array, then it has length elements stored, you may visit them using index:

name[idx];
name[idx];

each element can be seen as a regular variable whose type is same as type used to define whole array.

And we can then traversal array using loop:

int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}
int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}

Then, how can we initialize an array?

There are two main ways:

type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...
type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...

One is not write length, but just wrap initial values using brackets, the final array will have the length of total count of initial values. The other way is to specify length, and also provide initial value wrapped using brackets.

For multiple-dimension arrays, you must specify other dimension length except first one, and you can write initial values directly in one pair of brackets, but also, spare each dimension array elements using different brackets pair.

2.1.1.8.14.1 C Style String

Finally, we come to string part.

As we mentioned before, string and character has some special relationship. Actually, strings in c programming language are array of char.

In C programming language, it will treat char array end with '0' as a string.

2.1.1.8.15 sizeof sizeof

Though it is possible to traversal arrays using literals. It is not that convenient.

To simplify operation, we can use sizeof sizeof operator:

sizeof(type)
sizeof(variable)
sizeof(array)
sizeof(type)
sizeof(variable)
sizeof(array)

sizeof sizeof operator will return the total length of target type/variable/array in bytes. So, to have the length of array, we can say that:

int len = sizeof(array) / sizeof(type);
int len = sizeof(array) / sizeof(type);
2.1.1.8.16 Iterator

To traversal arrays, using idx idx traversal variable is one possible method. The other way to archive the goal is using iterator.

int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}
int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}

here, we defined p as iterator for array a. And then, it is able to iterate whole array.

The p here is called, pointer points to int.

More detail will be covered in Pointers section.

2.1.1.8.17 Function

Function, a kind of contract, accepts some input and generate outputs. Most similar to their mathematical form, any same input provide for a function will result in same output. Furthermore, the format of function is almost same as that in math:

int func(int R);
int func(int R);

You may assume it as: function 𝑓:𝑁𝑁 or 𝑓(𝑥)𝑁,𝑥𝑁 And

float func(float a, float b);
float func(float a, float b);

may represents function 𝑓:𝑅,𝑅𝑅 for 𝑓(𝑣)𝑅,𝑣=𝑎,𝑏,𝑎,𝑏𝑅.

Formally, input in C programming language can be zero or more parameters. And output are something so called "return value". There may exists more way to pass output value other than regular returning method.

Ideally, a function may not affect anything outside itself, this kind of function are seen as pure functional function. But, in normal program, they may need to perform operations other than calculation. For example, I/O. Any operation modify memory, variables outside its own scope, or perform I/O, are defined as side effects of a function.

More particularly, some function in C programming language may have even no returning but side-effects.

2.1.1.8.17.1 Definition

To brief understand function in c, first look at the function definition.

Function definition does almost same as variable declaration, but the main purpose it to tell the compiler about a function's name, return type and its parameters, rather than allocate a new space indeed.

We call it prototype.

<return-type> <function-name>(<parameters> ...);
<return-type> <function-name>(<parameters> ...);

Usually, prototype are placed within headers.

For example, you may have prototype for function add add that generate sum of two integer like:

int add (int a, int b);
int add (int a, int b);

Here we declare the function add, which accepts two arguments, corresponding to parameters a, and b respectively.

And then, as variables must initialized before referenced. Functions must have finish implementation before being called.

Function implementation roughly like declaration, but with extra function body part:

<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}
<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}

Body part may be regular statements, but also possible for return return statement.

Purpose of return return statement is tell the program, which value are seen as return value of the function.

Like equation mark in 𝑓(𝑥,𝑦)=𝑥+𝑦.

Here we implement function add add :

int add (int a, int b) {
  return a + b;
}
int add (int a, int b) {
  return a + b;
}
2.1.1.8.17.2 Function Calling

Once a function has been defined, it can be used in our program with function call syntax.

As we mentioned very early at the beginning of our tutorial, a function call is written in such form:

<function-name> (<arguments> ...)
<function-name> (<arguments> ...)

And arguments must match parameter in order and type.

For example, if we have a function add defined before,

int add(int a, int b){
  return a + b;
}
int add(int a, int b){
  return a + b;
}

Then we can use it like:

#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}

first argument we provide for add add is integer variable a, which has the same type as parameter a a , and second argument is literal value 20 20 , since any integer literal without suffix will be seen as integer in c, it has also same type with parameter b b . Thus, the function call is acceptable.

But what if we provide arguments less, more, or even has type mismatch? The C programming language will complain about syntax error.

2.1.1.8.17.3 Recursion

Since a function can be called within body of other functions, it make nonsense to prevent a function calling it self.

A function that calling it self are called recursion function.

For example, factorial function can be defined using recursion:

int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}
int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

The basic structure of recursion function is similar to normal function, the only difference is that it calls itself within its body.

But since recursion function may call itself infinite times, it must have a terminal condition to stop further calls.

Here the if statement works as terminal condition. When n equals to 0, the function will return 1 directly, without further calling itself.

2.1.1.8.17.4 Function Tail Call Optimization

In some case, a function's last operation is calling another function, which is called tail call.

And if a function's last operation is calling itself, it is called tail recursion.

In most case, a infinite tail recursion will result in stack overflow, but with tail call optimization, the compiler can optimize tail calls to avoid the case.

The common way to implement tail call optimization is Continuous Passing Style.

2.1.1.8.17.4.1 Continuous Passing Style

Continuous Passing Style (CPS) is a style of programming where control is passed explicitly in the form of a continuation.

2.1.1.8.18 Assembly
2.1.1.8.18.1 Architecture
2.1.1.8.18.1.1 AMD64 (x86_64)
2.1.1.8.18.1.2 Aarch64 / arm64
2.1.1.8.18.1.3 MIPS / Loong
2.1.1.8.18.2 BUS
2.1.1.8.18.2.1 Bridges
2.1.1.8.18.3 CPU
2.1.1.8.18.4 Intel Syntax, AT&T Syntax
2.1.1.8.18.5 Memory Access
2.1.1.8.18.6 Commands
2.1.1.8.18.7 Direct Memory Access
2.1.1.8.19 Stack
2.1.1.8.19.1 Frames
2.1.1.8.19.2 Stack Variables, Local Variables
2.1.1.8.19.3 Recursion Function Expansion
2.1.1.8.20 Global Variables
2.1.1.8.21 Variable Scope
2.1.1.8.21.1 Dynamic Scope
2.1.1.8.21.2 Lexical Scope
2.1.1.8.21.2.1 Function Scope
2.1.1.8.21.2.2 Block Scope
2.1.1.8.22 Closure
2.1.1.8.23 Heap Space
2.1.1.8.23.1 Variable Allocation
2.1.1.8.24 Memory Management
2.1.1.8.24.1 Virtual Memory (OS)
2.1.1.8.25 Function Call
2.1.1.8.25.1 Function Stack
2.1.1.8.25.2 Function In Assembly
2.1.1.8.26 goto goto
2.1.1.8.27 User Defined Types
2.1.1.8.27.1 Struct Struct
2.1.1.8.27.1.1 Bit Field
2.1.1.8.27.1.2 Simulate class class Using Structure
2.1.1.8.27.1.3 Virtual Function Table
2.1.1.8.27.2 Enum Enum
2.1.1.8.27.3 Union Union
2.1.1.8.28 Structure space, Memory Alignment & Offset
2.1.1.8.29 Pointers
2.1.1.8.29.1 Pointer offset, index & linked list
2.1.1.8.29.2 Array, Pointers Points To Continuous Memory
2.1.1.8.29.3 Function pointers
2.1.1.8.29.3.1 Form
2.1.1.8.29.3.2 Function As Function Pointer
2.1.1.8.29.3.3 Calling With Function Pointer
2.1.1.8.29.3.4 Simplified Function Call
2.1.1.8.29.4 Void Pointers
2.1.1.8.29.5 Pointer Convert
2.1.1.8.30 Pointer in Assembly
2.1.1.8.31 Exception
2.1.1.8.31.1 setjump setjump , longjump longjump
2.1.1.8.31.2 Try-Catch, Throw
2.1.1.8.31.3 Seh, Structure exception handler
2.1.1.8.31.4 Herbexception
2.1.1.8.31.5 Exception spread
2.1.1.8.31.6 Condition System
2.1.1.8.31.7 Continuous
2.1.1.8.32 Preprocessor
2.1.1.8.32.1 Header files, #include #include
2.1.1.8.32.2 Macro
2.1.1.8.32.2.1 C Style Macro
2.1.1.8.32.2.2 M4 Macro Language
2.1.1.8.32.2.3 C++ Template
2.1.1.8.32.2.4 Rust Procedure Macro
2.1.1.8.32.2.5 Rust Macro Rules
2.1.1.8.32.2.6 Macro Assembly, Pseudocode
2.1.1.8.32.2.7 Common Lisp Expansion Macro
2.1.1.8.32.2.8 Common Lisp Reader Macro
2.1.1.8.32.2.9 Scheme Hygiene Macro System
2.1.1.8.32.2.10 Scheme Syntax Rules
2.1.1.8.32.2.11 Scheme Syntax Case
2.1.1.8.32.2.12 Hygiene for the Unhygienic
2.1.1.8.32.3 Compiler Comments
2.1.1.8.32.4 #progma #progma
2.1.1.8.33 Meta-programming
2.1.1.8.34 Compiler
2.1.1.8.34.1 Compile Process
2.1.1.8.34.2 Compiler Driver
2.1.1.8.34.3 Assembler
2.1.1.8.34.4 Assemble
2.1.1.8.34.5 Assembly Code
2.1.1.8.34.6 Linker
2.1.1.8.34.7 Link
2.1.1.8.35 Executable File
2.1.1.8.35.1 Object
2.1.1.8.35.2 Executable
2.1.1.8.35.3 Executable File Format
2.1.1.8.35.3.1 Portable Executable (PE)
2.1.1.8.35.3.2 Executable Linkable Format (ELF)
2.1.1.8.35.3.3 Mach-5 (Fat-5)
2.1.1.8.35.3.4 Common Object File Format (COFF)
2.1.1.8.35.3.5 Binary (Bin)
2.1.1.8.36 ABI
2.1.1.8.36.1 Function Call Conventions
2.1.1.8.36.1.1 __cdecl __cdecl
2.1.1.8.36.1.2 __stdcall __stdcall
2.1.1.8.36.1.3 __fastcall __fastcall
2.1.1.8.36.1.4 thiscall thiscall
2.1.1.8.36.1.5 Microsoft 4-register fastcall __vectorcall __vectorcall
2.1.1.8.36.1.6 System V ABI syscall
2.1.1.8.36.2 Function Naming Convention
2.1.1.8.36.2.1 C Function Naming Convention
2.1.1.8.36.2.2 MSVC C++ Function Naming Convention
2.1.1.8.36.2.3 Rust Function Naming Convention
2.1.1.8.36.2.4 Common Lisp Naming Convention
2.1.1.8.36.3 Endian
2.1.1.8.36.4 Dynamic Linked Library
2.1.1.8.36.5 Static Linked Library
2.1.1.8.36.6 fPIE, fPIC
2.1.1.8.37 Multiple File Compile
2.1.1.8.37.1 Compile Unit
2.1.1.8.37.2 Object
2.1.1.8.38 Build Systems
2.1.1.8.38.1 C Project Management
2.1.1.8.38.2 Makefiles
2.1.1.8.38.3 AutoTools
2.1.1.8.38.4 CMake
2.1.1.8.38.5 VSXMake (VSProj)
2.1.1.8.38.6 XMake
2.1.1.8.39 Variable Decorator
2.1.1.8.40 asm volatile (assembly code : output operands : input operands : clobbers) asm volatile (assembly code : output operands : input operands : clobbers)
2.1.1.8.41 __attribute__((attribute)) __attribute__((attribute))
2.1.1.8.42 _Generic _Generic
2.1.1.8.43 ..., va_start, va_arg, va_end ..., va_start, va_arg, va_end Macro, stdarg.h
2.1.1.8.44 __VA_ARGS__ __VA_ARGS__
2.1.1.8.45 Variable Length Array
2.1.1.8.46 ASCII, EBCDIC, Unicode/UCS-II
2.1.1.9  From The C Programming Language To Theoretical Computer Science (Section II) [S2]
2.1.1.9.1 From the C programming language to Theoretical Computer Science
2.1.1.9.1.1 Object-Oriented Programming
2.1.1.9.1.2 Generic Types
2.1.1.9.1.2.1 Template
2.1.1.9.1.2.2 Types Erase
2.1.1.9.1.3 Inheritance
2.1.1.9.1.3.1 Class Object
2.1.1.9.1.3.2 Prototype Chain
2.1.1.9.1.4 Polymorphism
2.1.1.9.1.4.1 Interface
2.1.1.9.1.4.2 Trait
2.1.1.9.1.4.3 Duck Type
2.1.1.9.1.5 Encapsulation
2.1.1.9.1.5.1 Accessibility
2.1.1.9.1.6 Object System
2.1.1.9.1.6.1
2.1.1.9.1.7 Turning Machine
2.1.1.9.1.8 Lambda Calculus
2.1.1.9.1.9 First Order Function
2.1.1.9.1.9.1 Church numeral
2.1.1.9.1.10 Formal Verification
2.1.1.9.1.11
2.1.1.10  D-Flat Compiler Frameworks [Compiler]
2.1.1.11  D-Flat System Main Description [D_Flat]
2.1.1.12  D-Flat Editor & IDE [Editor&IDE]
2.1.1.12.1 D-Flat Editor
2.1.1.12.2 Configuration Language
2.1.1.12.3 Plugin
2.1.1.12.4 Extension
2.1.1.12.5 IDE Layer
2.1.1.13  Lambda Calculator Simulator (SKI) for Project D-Flat [Lambda]
2.1.1.13.1 Lambda Calculator Virtual Machine Design
2.1.1.14  Lilies: S-Expression Language Build Upon D-Flat System [Lilies]
2.1.1.14.1 Abstract 摘要

Lilies (short for "List Interpret Language in s-Expression Syntax") is a dialect of LISt-Processing language.

This report describes the design and implementation of Lilies language.

Lilies is designed to be extremely simple and portable. With a small set of kernel, clear semantics, and a powerful macro system, Lilies makes it easy to combine expressions into higher-level constructs.

The language is designed to be extensible and flexible: its hygienic macro system lets users defines new syntax and corresponding semantics safely. A set of built-in special forms and macros is provided to simplify common programming tasks; these act as syntactic sugar over the core language.

Lilies aims to be efficient practical and safe. With a strong type system, an ownership model forces memory safety and, and compile-time evaluation capabilities, the language can force programmers to write efficient and safe code. Lilies can express complex algorithms and data structures in functional, imperative, declarative and message passing styles or so.

The standard library for Lilies are divided into two parts: a core language library that provides basic data types, syntaxes, and contracts; and a compile-time library that supplies macros and compile-time functions.

The language Lilies should be implemented with both an interpreter and a compiler. Together with REPL, Development Environment, Debugger, and other tools to provide a complete programming experience.

The language has a full type system: primitive types, composite types, generic types, and user-defined types, plus type annotations and type inference. The type system should support type inference, type checking, and type casting. Providing with interface, trait, and generic programming capabilities.

Lilies should include a complete module system (module definition, import / export, and versioning) that support dependency management and module resolution.

It should also include a complete exception handling system (exception definition, exception throwing and catching, and exception propagation) with custom exception types definition and hierarchies.

The language should support continuation system (definition, capture and invocation), including continuation system should support first-class continuations and continuation passing style.

Finally, Lilies should provide a comprehensive metaprogramming system (macros, compile-time functions, and code generation) that support hygienic macros and compile-time evaluation.

Lilies(全称 “List Interpret Language in s-Expression Syntax”)是一种列表处理语言方言。本报告描述了 Lilies 语言的设计与实现。

Lilies 的设计目标是极其简单且可移植。通过一个精简的内核、清晰的语义以及强大的宏系统,Lilies 能够方便地将表达式组合成更高层次的构造。该语言强调可扩展性与灵活性:其卫生宏(hygienic macro)系统使用户可以安全地向语言中添加新的语法及相应语义。语言提供了一组内建的特殊形式和宏以简化常见编程任务,这些可视为语法糖。

Lilies 追求高效、实用与安全。借助强类型系统、强制内存安全的所有权模型以及编译时求值能力,语言能够帮助程序员编写高效且安全的代码。Lilies 可用于以函数式、命令式或消息传递等风格表达复杂算法与数据结构。

Lilies 的标准库分为两类:一类是核心语言库,提供基本数据类型、语法与契约;另一类是编译时库,提供宏和编译时函数。

Lilies 应当同时实现解释器和编译器,并配套提供 REPL、开发环境、调试器及其它工具,以提供完整的开发体验。

该语言应具备完整的类型系统,包括基本类型、复合类型、泛型类型和用户自定义类型,并支持类型注解与类型推断。类型系统应支持类型推断、类型检查与类型转换,并提供接口、特征(trait)和泛型编程能力。

Lilies 应设计完整的模块系统,包含模块定义、导入导出与版本管理,模块系统应支持依赖管理与模块解析。语言还应设计完整的异常处理系统,包含异常定义、抛出与捕获以及异常传播,并支持自定义异常类型与异常层次结构。

Lilies 应设计完整的续体/延续(continuation)系统,包含续体的定义、捕获与调用,支持一等续体和续体传递风格(continuation-passing style)。

最后,Lilies 应设计完善的元编程系统,包含宏、编译时函数与代码生成,元编程系统应支持卫生宏与编译时求值。

2.1.1.14.2 Introduction 引言

A single generic programming language cannot satisfy all needs of all programmers. Therefore reducing language complexity is important: keep a small core and give users the ability to extend the language.

A simple, clear expression syntax and unlimited composability of expressions make it possible to construct a practical and effective programming language.

Lilies draws many design ideas from earlier Lisps and Scheme dialects: first-class functions (procedures), lexical scope, continuations, and macros. Syntax objects can be manipulated programmatically. In contrast to those languages, Lilies is designed with a strong static type system.

Lilies is intended to be a native language that can compete with C, or a compilation target upon which other languages can be implemented. In the D-Flat system, Marguerite is implemented on top of Lilies.

All symbols in Lilies share a single namespace, whether they are variables, functions, classes, interfaces, modules, or other entities. In each expression, operators and operands are distinguished by their positions.

Unlike some Lisp dialects that use function application to implement loops, Lilies provides full functional loop constructs as built-in syntax extensions (outside the minimal core). Tail-call optimization is provided to ensure loops are efficient.

Object-oriented classes are supported. Everything in Lilies is an object, including functions, classes, interfaces, and modules. Classes can be computed at compile time, enabling powerful metaprogramming and generic programming. With traits and interfaces, Lilies supports polymorphism and code reuse. Contracts enable design-by-contract programming. The language also provides full compile-time type checking and type inference.

Modules are first-class citizens: they can be defined, imported, and exported.

The language can capture continuations — the "rest of the computation" at any point — allowing advanced control-flow constructs to be built on top. When a continuation is captured it is saved as an "escape procedure", a function that can be invoked later to resume execution at the capture point. Delimited continuations are also supported.

For higher-level control, algebraic effects and handlers are supported. Although effect handlers can be implemented with continuations, Lilies treats them as a distinct construct with dedicated syntax and semantics.

A full functional exception system is provided. Exceptions can be defined, raised, caught, propagated, and in some cases resumed, allowing flexible handling.

There are several ways to extend the language; macros are the most powerful. Lilies’ macros are hygienic and let users parse ASTs, access or drop contextual information, and generate new syntax trees. Macro-generated syntax can be hygienic or intentionally unhygienic as needed. Syntax objects are first-class, permitting parsing, manipulation, and generation of syntax trees, especially within macros. Another extension mechanism is symbol generation: new expressions can be generated at compile time with specific symbols or attributes (similar in spirit to KSP for Kotlin or Roslyn for C#).

The macro system must ensure that macros can provide the same compile-time information as built-in syntax so the compiler can produce full error diagnostics.

The language is built on an attribute grammar so that each syntax node can carry attributes used to store type information, scope information, and other metadata.

Except for define define , no construct may directly create new bindings in the current scope. The let let and let: let: families create bindings through closure capture. The language is designed to be referentially transparent: variables, functions, classes, modules, and macros should be defined before use.

These features make Lilies a powerful tool for building complex software systems and a fertile platform for research in programming theory.

单一的通用编程语言无法满足所有程序员的所有需求。因此,简化语言复杂性很重要:保留最小核心,并赋予用户扩展语言的能力。

简单清晰的表达式语法以及表达式的无限可组合性,使得构建实用且高效的编程语言成为可能。

Lilies 在设计上借鉴了早期的 Lisp 和 Scheme 方言的许多思想:一等函数(过程)、词法作用域、continuations(续延/延续)和宏。语法对象可以以编程方式进行操作。与这些语言不同,Lilies 设计为具有强静态类型系统的语言。

Lilies 的目标是成为一门可与 C 竞争的本地语言,或作为其他语言的编译目标。在 D-Flat 系统中,Marguerite 就是建立在 Lilies 之上的。

在 Lilies 中所有符号共享同一个命名空间,不论它们是变量、函数、类、接口、模块或其他实体。在每个表达式中,运算符和操作数由其位置来区分。

不同于某些 Lisp 方言通过函数调用实现循环的做法,Lilies 提供完整的函数式循环构造,作为内建的语法扩展(而非核心)。同时提供尾调用优化以保证循环的高效性。

支持面向对象的类。Lilies 中的一切都是对象,包括函数、类、接口和模块。类可以在编译期计算,从而支持强大的元编程能力和泛型编程。通过 trait(特征)和接口,Lilies 支持多态和代码重用。通过契约(contracts),支持契约式设计。语言同时提供完整的编译时类型检查和类型推断能力。

模块是第一类公民:可以定义、导入和导出。

语言能够捕获任意时刻的 continuation(程序剩余计算),从而可以构建高级控制流构造。捕获的 continuation 会被保存为“逃逸过程”(escape procedure),这是一个可以稍后调用以从捕获点恢复计算的函数。Lilies 也支持定界(delimited)continuation。

为了实现更高层次的控制,Lilies 也支持代数效果(algebraic effects)及其处理器。虽然效果处理器可以用 continuation 来实现,但 Lilies 将它们作为独立的构造来提供,以便拥有更好的语法和语义支持。

提供了完整的函数式异常处理系统。异常可以定义、抛出、捕获和传播,并在某些情况下支持恢复,从而灵活地处理错误。

有多种方式扩展语言,其中宏是最强大的。Lilies 的宏是“卫生”的(hygienic),并允许用户解析抽象语法树(AST)、获取或丢弃上下文信息、生成新的语法树。宏生成的语法可以根据需要是卫生的或有意非卫生的。语法对象在 Lilies 中是一等公民,便于在宏中解析、操作和生成语法树。另一种扩展方式是符号生成:可以在编译时根据给定的符号或属性生成新的表达式(类似于 Kotlin 的 KSP 或 C# 的 Roslyn)。

宏系统必须确保程序中使用的宏在编译时能提供与内建语法相同的信息,以便编译器能给出完整的错误诊断。

该语言基于属性文法构建,每个语法节点都可以关联属性,用于存储类型信息、作用域信息或其他元数据。

define define 外,任何构造都不能直接在当前作用域创建新的绑定。 let let let: let: 系列通过闭包捕获来创建绑定。因此语言被设计为引用透明:变量、函数、类、模块和宏应在使用前定义。

这些特性使 Lilies 成为构建复杂软件系统的强大工具,同时也是计算机程序理论研究的良好平台。

2.1.1.14.2.1 Background 背景

The lilies language is designed and implemented as part of the D-Flat system. For creating a practical programming language and a powerful tool that can be used to implement other languages.

In the design of Lilies, many ideas and concepts from other programming languages are borrowed.

2.1.1.14.2.2 Guiding Principle 指导方略

The design of Lilies is guided by several principles:

  1. Simplicity: The language should be simple and easy to learn, with a small set of core constructs and clear semantics.
  2. Portability: The language should be portable, able to run on a variety of platforms and architectures.
  3. Extensibility: The language should be extensible, allowing users to define new syntax and without modifying the core language.
  4. Orthogonality: The language should be orthogonal, with constructs that can be combined in a variety of ways without unexpected interactions.
  5. Uniformity: The language should be uniform, with consistent syntax and semantics across different constructs; Source code should be able to be treated as data and vice versa.

For real world programming, the following principles are also important:

  1. Enable library creation and code reuse.
  2. Provide strong type system to catch errors at compile-time.
  3. Allowing for efficient code generation and execution.
  4. Support multiple programming paradigms, including functional, imperative, and declarative programming styles.
2.1.1.14.3 Overview 语言总览

本章用于描述语言的基本概念, 以帮助了解后续章节. 本章依据语法条目以帮助手册的方式被组织起来, 并非完整对于语言的描述. 在某些地方也不会完善和规范.

2.1.1.14.3.1 Variable, Slots & Fields 变量, 插槽与字段

Variables in Lilies are some space allocated to store values.

Slots are locations within objects that can hold values, named or not. In practice, slots are some space allocated within an object to store values.

Fields are similar to slots, but they are named and is used to store values that are associated with a specific object instance.

2.1.1.14.3.2 Type System 类型系统

Every value in Lilies has a type. Types are used to classify values and determine what operations can be performed on them.

It is able to define new types by combining existing types (structures) or inductively defining new types (recursive types).

Each type are individual, defined by its name, structure, and behavior. But types can also have hierarchical relationships with other types through inheritance and subtyping. A type can be a subtype of another type, if and only if it inherits from that type and implement all traits and interfaces the type implemented.

Supertype doesn't means that all values of the subtype can be treated as values of the supertype. The only guarantee is that when a constraint requires a value of the supertype, a value of the subtype can be used instead.

Every type must derive a default "empty" value, together with its corresponding type, which is used when a value of that type is required but not provided. Every type has its own type checking rules, which are used to determine whether a value is of that type or not. Thus empty values can be distinguished from other values of the same type.

2.1.1.14.3.2.1 Basic Types 基本类型

Primitive types for Lilies language include:

  • Numbers
  • Booleans
  • Characters
  • Strings
  • Symbols
  • Pairs
  • Vector
  • Tuples
  • Any
  • None
  • Ignore
  • Meta
  • Unit
2.1.1.14.3.2.1.1 Number Tower 数字类型层次

Numbers in Lilies are organized in a type hierarchy known as the "number tower". At the base of the tower is the most general type, Number Number , which encompasses all numeric types:

  • Number
  • Complex
  • Real
  • Rational
  • Integer
  • Unsigned Integer
  • Zero

Below Unsigned Integer Unsigned Integer , there are specific types for different sizes of integers:

  • (int 8) (int 8) or (uint 8) (uint 8)
  • (int 16) (int 16) or (uint 16) (uint 16)
  • (int 32) (int 32) or (uint 32) (uint 32)
  • (int 64) (int 64) or (uint 64) (uint 64)

Zero is a special type that represents the value zero. It can be used to construct other numeric types.

Default Empty type for numbers is Zero.

2.1.1.14.3.2.1.2 Booleans 布尔类型

Booleans in Lilies are represented by the type Boolean Boolean , which has two possible values: #True #True (true) and #False #False (false). The boolean type are organized in a type hierarchy:

  • Boolean

    • True
    • False

Default Empty type for booleans is False.

2.1.1.14.3.2.1.3 Characters 字符类型

Characters in Lilies are represented by the type Character Character , which represents a single Unicode character. Default Empty type for characters is the null character type EOF, which has the only instance #\EOF #\EOF .

2.1.1.14.3.2.1.4 Strings 字串类型

Strings in Lilies are represented by the type String String , which represents a sequence of objects, typically characters. Default Empty type for strings is the Empty type, for which the only instance is the empty string "" "" .

String are some serialized data, a continuous sequence of bytes. No matter it is encoded utf-8 ro raw bytes, even integers or complex objects.

In Lilies, there are different kinds of continuous data:

  • Strings, which is described here,
  • Vector, fixed-size sequence of same-type elements,
  • Tuple, fixed-size sequence of potentially different-type elements,
  • Array, variable-size sequence of same-type elements,
  • List, variable-size sequence of potentially different-type elements, as a linked list,
2.1.1.14.3.2.1.5 Symbols 符号类型

Symbols is a unique and immutable identifier used to represent names or labels in Lilies. Symbols have their own name, which is a string. Symbols are often used as keys in associative data structures, such as hash tables or dictionaries. Two symbols with the same name are considered equal.

Symbols are interned, meaning that there is only one instance of a symbol with a given name in the system. When a symbol is created, the system checks if a symbol with the same name already exists, and if so, returns the existing symbol instead of creating a new one.

Symbols has their own type, Symbol Symbol . None default empty type for symbols.

2.1.1.14.3.2.1.6 Pairs 对偶类型

Pairs in Lilies are represented by the type Pair Pair , which represents a ordered pair of values. Pairs is a type as primitive type but with generic type parameters, allowing for pairs of any two types of values.

Pairs that the second element contains another pair that has its second element being None are treated as lists. Which are linked lists constructed from pairs.

Default Empty type for pairs is the Pair::Empty Pair::Empty type, for which the only instance is the pair (None . None) (None . None) .

2.1.1.14.3.2.1.7 Vectors 向量类型

Vectors in Lilies are represented by the type Vector Vector , which represents a fixed-size sequence of values. Vectors is a type as primitive type but with two generic type parameters: the type of the elements and the size of the vector.

Default Empty type for vectors is the Vector::Empty Vector::Empty type, a vector type that has size of 0 and type of None. The only instance of this type is the empty vector #() #() .

2.1.1.14.3.2.1.8 Tuples 元组类型

Tuples in Lilies are represented by the type Tuple Tuple , which represents a fixed-size sequence of values of potentially different types. Tuples is a type as primitive type but with a variable number of generic type parameters, each representing the type of an element in the tuple.

Default Empty type for tuples is the Tuple::Empty Tuple::Empty type, a tuple type that has no elements. The only instance of this type is the empty tuple #<> #<> .

2.1.1.14.3.2.1.9 Any 任意类型

Any type is the supertype of all types in Lilies. Every value in Lilies is of type Any. But Any type cannot hold any value directly nor be instantiated.

In practice, Any type is used as a placeholder type when the specific type of a value is not known or not important.

Any type has no default empty type.

2.1.1.14.3.2.1.10 None 空类型

None type is the subtype of all types in Lilies. None represents the absence of a value. None type can hold only one value, which is also called None.

In practice, None type is used to indicate that a value is missing or not applicable.

None type is the default empty type for Symbols, and itself.

2.1.1.14.3.2.1.11 Ignore 忽略类型

Ignore type is a special type that indicates that a value should be ignored. Values of Ignore type are not stored or used in any way. Ignore type is often used in situations where a value is required by the syntax or semantics of the language, but the value itself is not important. Ignore type has only one value, also a variable, which is also called Ignore.

In practice, Ignore type is used to indicate that a value should be ignored or discarded.

Ignore type is the default empty type for itself.

2.1.1.14.3.2.1.12 Meta 元类型

Meta type is the type of types in Lilies. Meta type may be structure description or type generator.

Meta type always promises to be non-empty, thus has no default empty type.

2.1.1.14.3.2.1.13 Unit 单元类型

Every structure that has no fields is considered as Unit type. Thus unit type is not a primitive type, but a special structure type.

Unit types cannot have instances, thus has no default empty type.

2.1.1.14.3.2.2 Syntax Object 语法类型

Syntax objects in Lilies are representations of code as data structures, together with contextual information such as scope and source location. Syntax objects are so special that they should be built-in and given first-class status in the language.

2.1.1.14.3.2.3 Closure Type 闭包类型

Functions in Lilies represents a mapping from a set of input values (parameters) to a set of output values (return values). And can capture the lexical scope in which they are defined, forming closures.

Closure type constructs the type of a function, including the types of its parameters and return values.

2.1.1.14.3.2.4 Composite Types 复合类型

There are composite type constructors provided in Lilies language, including:

  • product types

    • tuples
    • pairs
    • vectors
    • lists
    • arrays
    • maps
    • structures
  • sum types

    • tagged unions
  • recursive types

    • linked lists
  • intersection types

    • traits
    • interfaces

Some of them are built-in primitive types with generic type parameters, such as tuple, pair, and vector. Others are constructed through type definition syntax, such as structures, unions, and recursive types.

Use type type to define new recursive types by creating type generators that can produce types based on type parameters. The type described by type type will not create a new type indeed, rather a new type checker that can check whether a value is of the described type or not will be implemented.

2.1.1.14.3.2.5 Enum Types 枚举类型

Enumeration types in Lilies are special form of tagged unions, which represent a set of named values.

2.1.1.14.3.2.6 Internal Types 内部类型

Internal types in Lilies are special types that are used by the language implementation itself, and are not intended to be used directly by programmers. The only exception is the Syntax Object type, which is used in macros and syntax manipulation.

2.1.1.14.3.2.7 Generic 泛型类型

There exists different kinds of generic type implements in practice, including:

  • monomorphization
  • type erasure
  • dictionary passing / witness table
  • reified generics
  • boxing / universal representation
  • compile-time type computation / metaprogramming
  • canonicalization

In the Lilies language, compile-time type computation is main approach used to implement generics.

2.1.1.14.3.2.8 Traits 特征与接口

Traits are a way to define shared behavior that can be implemented by multiple types. Furthermore, traits can be composed together to create new traits.

Traits can be used to constraint generic types, ensuring that a type parameter implements a specific set of behaviors. Traits can be used to define dynamic dispatch rule, allowing methods to be called on values of different types that implement the same trait.

2.1.1.14.3.2.9 Type Dispatch 类型分派

When a value is used in an expression, the type of the value is determined through type dispatch.

2.1.1.14.3.2.10 Auto Type Detection 自动类型检测

When defining variables, functions, classes, and so on, if the type is not explicitly specified, the type will be inferred from the context.

2.1.1.14.3.3 Object System 对象系统

Object is the core concept of Lilies language. Though types in Lilies can not inherit from other types in the traditional sense, objects system for Lilies still provides other way to archive polymorphism and code reuse.

The class defines only the structure of a object, but methods are implemented separately. With traits, it becomes possible to share method implementations across different classes and extend object behaviour outside the class definition.

A concept of generic function is borrowed from CLOS and it is renamed to interface interface in Lilies. With interface, user-defined methods can be called in a uniform way as traditional functions. Another benefit is that interfaces are all static dispatched by default, making them more efficient than traditional methods.

implement implement syntax will create methods for a specific class, and assign the method to corresponding class.

There are still some special concept borrowed form traditional OOP languages:

  • Fields: named slots associated with a specific object instance.
  • Properties: named slots that used for value fetching only.

All objects in lilies are referenced by value by default. To have a object referenced by reference, use type wrappers.

Type wrapper can be ownership, garbage collected or reference counted pointer.

This part describes the object system, definition of classes, and their possible literals.

2.1.1.14.3.3.1 Primitive Object 原始对象

Primitive objects in Lilies are build upon primitive types. Some of primitive objects can be written in literal syntax.

Primitive objects cannot be split into smaller parts.

For which, there are:

  • Integer Object

    • [1-9][0-9]* [1-9][0-9]*
    • 0b[01]+ 0b[01]+
    • 0o[0-7]+ 0o[0-7]+
    • 0x[0-9a-fA-F]+ 0x[0-9a-fA-F]+
  • Float Object

    • [0-9]+\.[0-9]*([eE][+-]?[0-9]+)? [0-9]+\.[0-9]*([eE][+-]?[0-9]+)?
    • \.[0-9]+([eE][+-]?[0-9]+)? \.[0-9]+([eE][+-]?[0-9]+)?
    • [0-9]+[eE][+-]?[0-9]+ [0-9]+[eE][+-]?[0-9]+
  • Character Object

    • #\descrition #\descrition
    • #\'character #\'character
    • #\uXXXX #\uXXXX
  • String Object

    • "string content" "string content"
    • #f"string content with escapes" #f"string content with escapes"
    • #b"raw string content" #b"raw string content"
  • Symbol Object

    • 'symbol-name 'symbol-name
  • Boolean Object

    • #True #True
    • #False #False
  • Pair Object

    • '(first . second) '(first . second)

Above, quote syntax is used to create literal syntax for symbols and pairs.

2.1.1.14.3.3.2 Classes, Fields, Properties & Traits 类, 字段, 属性与特征

Classes are user defined types for structure types.

A classes can declare it inherits from a parent class explicitly, but that will not change the class structure. If a class is declared to have a parent class, it must implement all traits that its parent class implements.

Fields are named slots associated with a specific object instance. Each field has its own name and type. In class definition, fields are declared with define define syntax.

Properties are named slots that used for value fetching only. The method to declare a field as property can be various, Use setter and getter methods is one of the common way. However, it is encouraged to manually assign accessibility attributes to fields to control read and write access right for internal, class internal, package internal, and public access levels.

Traits are used to define shared behavior that can be implemented by multiple classes. Traits can be implemented manually for a class, and user defined traits can be used to extend class behavior for a library defined class.

2.1.1.14.3.3.2.1 Definition of Classes 类的定义

Define a new class with class class syntax. E.g., to define a new class Point Point with two fields x x and y y of type Integer Integer :

(define Point
  (class
    (define x Integer)
    (define y Integer))))
(define Point
  (class
    (define x Integer)
    (define y Integer))))

Here, define define syntax used to declare Point as the class we defined using class class syntax. And define define syntax inside the class body used to declare fields x x and y y of type Integer Integer . #:self this #:self this declares that within the class body, this this refers to the current instance of the class. Symbols starts with #: #: are keywords annotations, for which pass some attributes when function or macro application. Another special keyword annotations are start with #& #& , for passing some attributes when function or macro definition. Most generic annotations are written as #@[attributes] #@[attributes] , and is assigned to expressions. Later there will be a chapter describing all these annotations in detail.

Full syntax of class definition is described as:

class-definition ::=
'(' 'class' <inherits>
   { <fields> } ')'

<inherits>       => '(' { <class> } ')'
<fields>         =>
'(' ':fields' { <deffield> } ')'

<deffield>       =>
'(' 'define' <name> [ '#:type' ] <type> [ <init> ] ')'
class-definition ::=
'(' 'class' <inherits>
   { <fields> } ')'

<inherits>       => '(' { <class> } ')'
<fields>         =>
'(' ':fields' { <deffield> } ')'

<deffield>       =>
'(' 'define' <name> [ '#:type' ] <type> [ <init> ] ')'

Inherits clause declares the super classes of the class being defined. Self clause declares the symbol that refers to the current instance of the class within the class body. Type clause declares the type of the class being defined.

With annotations, the accessibility of fields can be controlled: E.g.,

(define Point
  (class
    #@[accessibility x (read :public) (write :private)]
    #@[accessibility y (read :public) (write :private)]
    (define x Integer)
    (define y Integer))))
(define Point
  (class
    #@[accessibility x (read :public) (write :private)]
    #@[accessibility y (read :public) (write :private)]
    (define x Integer)
    (define y Integer))))

To define filed to be variable, wrap type with variable variable .

2.1.1.14.3.3.2.2 Definition of Traits 特征的定义

Define a new trait with trait trait syntax. E.g., to define a new trait Drawable Drawable with a method draw draw :

(define Drawable
  (trait
    #:self self
    (define draw (function (self)))))
(define Drawable
  (trait
    #:self self
    (define draw (function (self)))))
2.1.1.14.3.3.2.3 Method and Trait Implementation 方法与特征实现

Both Methods and Traits are implemented with implement implement syntax.

implement implement unwraps namespace of a class, and then methods defined within the body are assigned to the class function table. Furthermore, traits can unwrap namespace of a object, and then anything inside will only extend the object behavior.

(implement Point (Drawable)
  #:self self
  #:Type Self
  (define draw
    (lambda (self)
      #:returns (None)
      (print f"x: {(field self 'x)}; y: {(field self 'y)}"))))
(implement Point (Drawable)
  #:self self
  #:Type Self
  (define draw
    (lambda (self)
      #:returns (None)
      (print f"x: {(field self 'x)}; y: {(field self 'y)}"))))
2.1.1.14.3.3.2.4 Generic Function & Interface 泛义函数与接口
2.1.1.14.3.3.2.5 Method Dispatch 方法分派

When a method is called on an object, the method to be executed is determined through method dispatch.

2.1.1.14.3.3.2.5.1 Dynamic Dispatch 动态分派
((method object 'method-name') ...args)
;; or
({method-name object} ...args) ; for short
((method object 'method-name') ...args)
;; or
({method-name object} ...args) ; for short
2.1.1.14.3.3.2.5.2 Static Dispatch 静态分派
((method Class 'method-name') ...args)
;; or
({method-name Class} ...args) ; for short
((method Class 'method-name') ...args)
;; or
({method-name Class} ...args) ; for short
2.1.1.14.3.3.2.5.3 Method Access 语法糖方法调用
2.1.1.14.3.3.2.5.4 Invoke 调用
2.1.1.14.3.3.2.6 Field & Property Access 字段与属性访问
2.1.1.14.3.3.2.7 Traits Shadowing 特征遮蔽
2.1.1.14.3.4 Expression
2.1.1.14.3.5 Apply & Evaluation
  1. Apply & Evaluation

    1. Value Pass
    2. Reference Pass

      1. Ownership transaction
      2. Move
      3. Brought
2.1.1.14.3.6 Variable, Binding & Reference
  • Variable, Definition & Binding

    • Dynamic Scope
    • Lexical Scope
    • define define
    • let let & let: let: family
    • Dynamic In Lexical Scope
  • Form
  • Assignment
2.1.1.14.3.7 Procedure, Function & Method
  • Functions

    • Parameters
    • Rest Parameters
    • Parameter Stack
    • Return Values
    • Multiple Values Returning
    • Function Call
    • Multiple Value for Function Call
2.1.1.14.3.8 Name Space, Lexical Scope, Dynamic Scope, Closure
2.1.1.14.3.9 Generics
  1. Generics: Template

    1. Generic Macro
2.1.1.14.3.10 Macro
  1. Macro

    1. History: Compile-time calculation
    2. History: C-Style Macro
    3. History: defmacro defmacro
    4. Procedure Macro
    5. Hygiene for the Unhygienic Macro
  2. Syntax Rules

    1. History: Hygiene Macro
    2. Syntax Object
2.1.1.14.3.11 Symbol Generation
2.1.1.14.3.11.1 Expression Tree
2.1.1.14.3.12 Memory Management
  1. Pointer

    1. Reference Count
    2. Unique Ownership
    3. Raw Pointer
    4. Address
    5. Virtual Method Table: How dynamic dispatch implemented
  2. Ownership
  3. Garbage Collection
  4. Allocation

    1. alloc:stack alloc:stack : Object Allocated in Stack
    2. alloc:heap alloc:heap : Object Allocated in Heap
    3. new new : Object creation
  5. Auto Life-cycle Detection
2.1.1.14.3.13 Continuations
2.1.1.14.3.14 Exception Handling
  1. Condition System
2.1.1.14.3.15 Module & Library
2.1.1.14.3.16 Top-Level
2.1.1.14.4
2.1.1.15  Margarita: Language as extension for Lilies in M-Expression [Margarita]
2.1.1.15.1 Abstract
2.1.1.15.2 Introduction
2.1.1.15.2.1 Background
2.1.1.15.2.2 Guiding Principle
2.1.1.16  STD: Standard Library For D-Flat System [StandardLibrary]
2.1.1.17  Turing Machine Simulator (R-M) for Project D-Flat [Turing]
2.1.1.17.1 Turing Machine Virtual Machine Design

The virtual machine works just similar to real CPU-memory.

The virtual machine has following properties:

  • 32-bit instruction width
  • 64-bit register size
  • 32 general-purposed registers
  • 32 special-purposed registers

The virtual machine adopt a new designed instruction set.

2.1.1.17.2  Architecture Overview [Overview]

The virtual machine works in a register-memory architecture.

File:                                                 Memory:
+--------------------------------------+      +-------------------------------+
| Archive                              |      |  +--------------------++++    |
| +----------+    +---------------+++  |      |  | Global Data Stack  ||||    |
| | Global   | +->| Function Unit |||| |      |  |--------------------++++    |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | Text Vector           |    |
| | | D |||  | |  | | T ||| |D||| |||| |      |  +-----------------------+    |
| | | a |||  | |  | | e ||| |a||| ||||==========>| Function Unit Vector  |<-+ |
| | | t |||  | |  | | x ||| |t||| |||| |      |  | +-------------------++|  | |
| | | a |||  | |  | | t ||| |a||| |||| |      |  | | +---------------+ |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | Data Vector   | |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | +---------+++ | |||  | |
| | | E |||  | |  | | C ||| |C||| |||| |      |  | | | | Literal ||| | |||  | |
| | | n |||  | |  | | o ||| |l||| |||| |      |  | | | +---------+++ | |||  | |
| | | t -------+  | | s ||| |o||| |||| |      |  | | | | capture ||| | |||  | |
| | | e |||  |    | | t ||| |s||| |||| | Load |  | | | +---------+++ | |||  | |
| | | r |||  |    | | a ||| |u||| |||| | ===> |  | | | | data    ||| | |||  | |
| | | y |||  |    | | n ||| |r||| |||| |      |  | | | +---------+++ | |||  | |
| | +--+++   |    | | t ||| |e||| |||| |      |  | | +---------------+ |||  | |
| |          |    | +--+++  +-++  |||| |      |  | | | Text          | |||  | |
| |          |    |               |||| |      |  | | +---------------+ |||  | |
| +----------+    +---------------+++  |      |  | +------------------+++|  | |
|                                      |      |  +---------------------+++  | |
+--------------------------------------+      |  | Execution Stack     |||  | |
                                              |  | +---------+++       |||  | |
                                              |  | | Pointer ---------------+ |
                                              |  | +---------+++       |||    |
                                              |  +---------------------+++    |
                                              |  | Register Records    |||    |
                                              |  +---------------------+++    |
                                              +-------------------------------+
File:                                                 Memory:
+--------------------------------------+      +-------------------------------+
| Archive                              |      |  +--------------------++++    |
| +----------+    +---------------+++  |      |  | Global Data Stack  ||||    |
| | Global   | +->| Function Unit |||| |      |  |--------------------++++    |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | Text Vector           |    |
| | | D |||  | |  | | T ||| |D||| |||| |      |  +-----------------------+    |
| | | a |||  | |  | | e ||| |a||| ||||==========>| Function Unit Vector  |<-+ |
| | | t |||  | |  | | x ||| |t||| |||| |      |  | +-------------------++|  | |
| | | a |||  | |  | | t ||| |a||| |||| |      |  | | +---------------+ |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | Data Vector   | |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | +---------+++ | |||  | |
| | | E |||  | |  | | C ||| |C||| |||| |      |  | | | | Literal ||| | |||  | |
| | | n |||  | |  | | o ||| |l||| |||| |      |  | | | +---------+++ | |||  | |
| | | t -------+  | | s ||| |o||| |||| |      |  | | | | capture ||| | |||  | |
| | | e |||  |    | | t ||| |s||| |||| | Load |  | | | +---------+++ | |||  | |
| | | r |||  |    | | a ||| |u||| |||| | ===> |  | | | | data    ||| | |||  | |
| | | y |||  |    | | n ||| |r||| |||| |      |  | | | +---------+++ | |||  | |
| | +--+++   |    | | t ||| |e||| |||| |      |  | | +---------------+ |||  | |
| |          |    | +--+++  +-++  |||| |      |  | | | Text          | |||  | |
| |          |    |               |||| |      |  | | +---------------+ |||  | |
| +----------+    +---------------+++  |      |  | +------------------+++|  | |
|                                      |      |  +---------------------+++  | |
+--------------------------------------+      |  | Execution Stack     |||  | |
                                              |  | +---------+++       |||  | |
                                              |  | | Pointer ---------------+ |
                                              |  | +---------+++       |||    |
                                              |  +---------------------+++    |
                                              |  | Register Records    |||    |
                                              |  +---------------------+++    |
                                              +-------------------------------+

In data status stored in file, the file includes following section:

  • Global Data section: storing global variable data
  • Global entry section: storing all entry points of data and function
  • Function Unit section: storing all function units Function Unit includes:

    • Text section: storing instructions to be executed
    • Data section: storing constants, variables used only in this function
    • Constant section: storing immediate value used in this function
    • Closure section: storing relocation information for captured variables

In memory, there are three segments:

  • Global Data Stack: every function will share the same global data stack, used for variable storage, argument passing, etc. Global Data Stack works like normal stack in x86_64 assembly. Global Data Stack stores global variable data at initial, and then construct function frames when function called. A Global Data Stack can be at most 4 GiB size. The register Reg#SS Reg#SS points to the current used data stack segment base. The global data stack may be duplicated and stored in new data stack segment, when continuous, fork, extremely large stack allocation invoked. And the Reg#SS Reg#SS will be updated to point to new data stack segment base. It is also possible to use the data stack duplicate for snapshot purpose.
  • Text Vector: Every function's text segment is loaded into text vector. Text Vector stores all text segments in the program.
  • Function Unit Vector: every function has its own function unit, including text segment and data segment Function Unit Vector stores all function units in the program. Function Unit Vector maps function index to function unit, a pointer points to corresponding function unit in function unit. If a Function unit is not be referenced by any pointer, the slot for the function unit is freed and can be reused. Function Unit includes:

    • Text Segment: A pointer points to text segment in text vector.
    • Data Segment: every function have its own data segment, storing literal data and captured data, which are pointers to global data stack.

      • Literal Section: literal are constant may used in function or for instruction parameter, not able to be embedded in instruction directly.
      • Capture Section: Captured data are pointers, points to global data stack or heap data. Every pointer must be pushed into capture section and deleted when the function does not hold it. If the parameter is a captured pointer, the pointer must be pushed into capture section. No pointer is allowed to be stored in global data stack except argument passing area.
      • Data Section: other data used in function.

    Literal Section is loaded from file into data segment directly. Capture Section is constructed when function unit constructed. Data section is loaded from file into data segment directly.

  • Execution Stack: every function call will push a pointer points to corresponding function unit in function unit vector into execution stack. Execution stack stores function call frame pointer. The Execution Stack can be duplicated and stored in new execution stack segment, when continuous, fork invoked. And the Reg#ES Reg#ES will be updated to point to new execution stack segment.
  • Register Records: The register records store all register values, and will change the value as instructions executed. Register records will be saved and restored with snapshot exception handling invoked.
2.1.1.17.3  Register [Register]

The register can be divided into two kinds:

  • General Purposed Registers
  • Special Purposed Registers

All registers are 64 bits length. And can be represented use 6 bits number.

General-purposed registers can be visited by user freely, and can be updated by any instruction. Change a general-purposed register will not affect any other register or virtual machine execution status.

Special-purposed registers reflects the execution status of virtual machine. The value of special-purposed registers may be changed by virtual machine automatically. Or changed by instructions automatically. The read-write ability below for each special-purposed register are suggested only.

It is not recommended to change special-purposed registers directly, though all special-purposed register can be read and write as general-purposed registers.

The name of registers start as " Reg# Reg# ", and following are its name, a number or a string.

General-purposed registers may have only numbers as their name. For example: Reg#0 Reg#0 , Reg#1 Reg#1 , … There are only 32 general-purposed registers available.

Special-purposed registers have their own name, and their own code (number):

  • Result discarding used:

    • Ignore: Reg#Ign Reg#Ign , code 0x3f 0x3f , any value move into will be ignored.
  • Arithmetic computation, Result used:

    • Accumulator: Reg#A Reg#A , code 0x3e 0x3e , for result of ADD ADD , SUB SUB , MUL MUL , and DIV DIV , or return value
    • Counter: Reg#C Reg#C , code 0x3d 0x3d , for loop counts
    • Reminder: Reg#R Reg#R , code 0x3c 0x3c , for reminder of DIV DIV , or return value
  • Execution locating used:

    • Program Counter Reg#PC Reg#PC , code 0x3b 0x3b , for next instruction to be executed
    • Execution Stack Pointer Reg#EP Reg#EP , code 0x3a 0x3a , for current execution frame in execution stack
    • Execution Segment Reg#ES Reg#ES , code 0x39 0x39 , for execution stack segment
  • Stack locating used:

    • Stack Base Pointer Reg#BP Reg#BP , code 0x38 0x38 , for current stack frame base
    • Stack Top Pointer Reg#SP Reg#SP , code 0x37 0x37 , for current stack frame top
    • Stack Segment Reg#SS Reg#SS , code 0x36 0x36 , for stack segment
  • Condition reflecting used:

    • flags: Reg#FLAGS Reg#FLAGS , code 0x35 0x35 , for flags after instruction execution
    • tests: Reg#TESTS Reg#TESTS , code 0x34 0x34 , for test condition
2.1.1.17.3.1 General Purposed Registers: Reg#n Reg#n , n for number

General-purposed registers, from Reg#0 Reg#0 to Reg#1F Reg#1F (31). Can be visited by user freely.

2.1.1.17.3.2 Ignore: Reg#Ign Reg#Ign

Ignore all value move into.

Assign-only register, special-purposed register that can be visited by user. If user try to read value from it, always get zero.

2.1.1.17.3.3 Accumulator, Counter, Reminder: Reg#A Reg#A , Reg#C Reg#C , Reg#R Reg#R

Every Result of ADD ADD , SUB SUB , MUL MUL , and DIV DIV , may assigned into Reg#A Reg#A , accumulator.

Loop counts may relay on Reg#C Reg#C , counter. If LOOP LOOP instruction used, Reg#C Reg#C will be decremented by one automatically.

Reminder of DIV DIV may assigned into Reg#R Reg#R , reminder.

Read-Write register, special-purposed register that can be visited by user.

It is possible to not use stack to pass return value between functions, then Reg#A Reg#A and Reg#R Reg#R used for return value passing.

2.1.1.17.3.4 Program Counter, Execution Stack Pointer: Reg#PC Reg#PC , Reg#EP Reg#EP

Reg#PC Reg#PC points to next instruction to be executed in current function frame.

Reg#EP Reg#EP points to current execution frame in execution stack.

Also used for provide unwind information.

Read only register, not recommended to write directly. Write operation on them will affect the execution status of virtual machine. If value written into Reg#PC Reg#PC is within corresponding text segment of current function frame, the next instruction to be executed will be changed. If value written into Reg#EP Reg#EP is out of range, virtual machine will raise exception. If value written into Reg#EP Reg#EP is less than current top of execution stack, virtual machine will unwind execution stack to the target frame. If value written into Reg#EP Reg#EP is larger than current top of execution stack, virtual machine will raise exception.

2.1.1.17.3.5 Stack Segment, Stack Pointer, Base Pointer: Reg#SS Reg#SS , Reg#SP Reg#SP , Reg#BP Reg#BP

Reg#SS Reg#SS referencing Data Stack Segment, with offset 232 (P.S., 4 GiB). In most cases, Reg#SS Reg#SS won't be changed, since data stack works like normal stack, with a small size.

Reg#SP Reg#SP referencing Stack Top for current Function Frame.

Reg#BP Reg#BP referencing Stack Base for current Function Frame.

Reg#SP Reg#SP and Reg#BP Reg#BP won't less than 0, and won't larger than Segment length, though they are 64 bit (52 bit for addressing) pointer.

Read Write register, not recommended to write directly. The value of Reg#SP Reg#SP and Reg#BP Reg#BP will be changed automatically when push / pop / call / ret instructions executed. The value of Reg#SS Reg#SS usually won't be changed, unless user allowed for a extremely large stack dynamically allocated.

User can write Reg#SP Reg#SP and Reg#BP Reg#BP directly to change the stack frame. User can write Reg#SS Reg#SS directly to change the stack segment base. If Reg#SS Reg#SS changed and not restored before returning from function, the behaviour of other function frame may be not correct.

2.1.1.17.3.6 Flags, Test: Reg#FLAGS Reg#FLAGS , Reg#TESTS Reg#TESTS

After any instruction, Reg#FLAGS Reg#FLAGS will be set according to execution result.

:TEST cond, jmp :TEST cond, jmp instruction will set Reg#TESTS Reg#TESTS according to cond cond , and check whether :AND Reg#TESTS, Reg#FLAGS :AND Reg#TESTS, Reg#FLAGS If cond cond is true, jump to dst dst .

There are some literal for cond cond .

  • Test#g Test#g
  • Test#ng Test#ng
  • Test#l Test#l
  • Test#nl Test#nl
  • Test#e Test#e
  • Test#o Test#o
  • Test#no Test#no

Or any literal 16 bits value is also acceptable.

The meaning of flags bits in Reg#FLAGS Reg#FLAGS is as following:

0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                                                              40
Default       |C|1|P|0|A|0|Z|S|T|I|D|O|IOP|N|0|R|V|A|V*V*I|                   |
              |F| |F| |F| |F|F|F|F|F|F|L  |T| |F|M|C|F|P|D|                   |
            =>                                                                |
              | Exception code                                                |
* VF <- VIF; VP <- VIP
0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                                                              40
Default       |C|1|P|0|A|0|Z|S|T|I|D|O|IOP|N|0|R|V|A|V*V*I|                   |
              |F| |F| |F| |F|F|F|F|F|F|L  |T| |F|M|C|F|P|D|                   |
            =>                                                                |
              | Exception code                                                |
* VF <- VIF; VP <- VIP
  • CF: Carry Flag
  • PF: Parity Flag
  • AF: Auxiliary Carry Flag
  • ZF: Zero Flag
  • SF: Sign Flag
  • TF: Trap Flag

Exception code are passed to exception interrupt handler when exception raised.

Write operation on them will not have any effect.

2.1.1.17.4  Pointer Specification [Pointer]

A pointer in this virtual machine is a 64-bit unsigned integer that stored in the capture section of function unit.

The pointer uses 46 bits to address and 6 bits to identify the type of pointer, rest 12 bits are reserved for future use. Address can be divided to two part: Pointer Base Address (PBA) and Segment.

0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                              30              38              40
Default       | Pointer Base Address                                          |
            =>  PBA(c.)       | Segment   | Type      |                       |
* VF <- VIF; VP <- VIP
0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                              30              38              40
Default       | Pointer Base Address                                          |
            =>  PBA(c.)       | Segment   | Type      |                       |
* VF <- VIF; VP <- VIP

Type field defines the type of pointer, following are all defined pointer types:

  • Heap Pointer: type code 0x00 0x00 , point to heap allocated memory block.
  • Stack Pointer: type code 0x01 0x01 , point to global data stack location.
  • Function Pointer: type code 0x02 0x02 , point to function unit entry point.
  • Text Pointer: type code 0x03 0x03 , point to text vector.
  • Data Pointer: type code 0x04 0x04 , point to data section in function unit.
  • Constant Pointer: type code 0x05 0x05 , point to constant section in function unit.
2.1.1.17.5 Interrupt and Exception Handling

Interrupt handled by interrupt dispatch table. The first 128 entries of function unit vector is reserved for interrupt handling. When interrupt invoked, virtual machine will do following steps:

  1. Store current execution status by pushing all registers into global data stack
  2. Invoke interrupt handler function unit from interrupt dispatch table
  3. After interrupt handler function unit return, restore previous execution status by popping all registers from global data stack

Exception handled by invoke exception handler function unit. The default exception handler function unit is at index 0 0 in function unit vector. It is a special interrupt handler.

The interrupt handler must be provided by program, if no interrupt handler provided, virtual machine will write interrupt dispatch table entry to point to default exception handler. Which is a function unit that display exception trace information and halt the virtual machine.

Exception handle process do like described in instruction raise raise .

2.1.1.17.6 Model

The execution model of virtual machine have following steps:

  1. Load function unit into function unit vector
  2. Initialize global data stack
  3. Initialize execution stack
  4. Initialize register records
  5. Start execution from main function

When function call invoked, virtual machine will do following steps:

  1. Push current function frame pointer into execution stack
  2. Create new function frame in global data stack
  3. push local variables into global data stack
  4. start execution from called function

When function return invoked, virtual machine will do following steps:

  1. Move return value into Reg#A Reg#A and Reg#R Reg#R , if the return value larger than 2 register can represent, move the returning value into pre-allocated space in global data stack, and move the pointer into Reg#A Reg#A .
  2. Pop current function frame from execution stack.
  3. Clean up current function frame in global data stack.
  4. Resume execution from previous function frame.

When snapshot exception invoked, virtual machine will do following steps:

  1. Duplicate current global data stack segment, execution stack segment, and register records.
2.1.1.17.7  Call Convention [Call]

When a function about to be called, the caller must do following steps:

  1. Reverse return value space allocation in global data stack
  2. Push function arguments into global data stack, left most argument pushed last
  3. Move the return value address into Reg#A Reg#A
  4. Invoke call instruction with function

When a function called, the callee must do following steps:

  1. Create new function frame in global data stack, store previous stack base pointer and stack top pointer
  2. Store return value address from Reg#A Reg#A into function frame
  3. Push local variables into global data stack

When a function about to return, the callee must do following steps:

  1. Move return value accordingly, if the signature of function return value by register, move return value into Reg#A Reg#A and Reg#R Reg#R Else move return value into pre-allocated space in global data stack
  2. Restore previous stack base pointer and stack top pointer from function frame
  3. Invoke ret instruction

When a function returned, the caller must do following steps:

  1. Clean up function arguments from global data stack
  2. Resume execution from previous function frame
2.1.1.17.8  Instruction Specification [Instruction]

All Instruction adopted in the virtual machine are 32-bits length-fixed.

The instruction have four type of addressing method:

  • None addressing: no parameter is accepted
  • Register addressing: parameter is a register
  • Immediate addressing: parameter is a literal value
  • Memory addressing: parameter is a memory address

From all above addressing methods, the instruction can be divided into following categories:

  • Zero operand instruction: no parameter

  • Register operand instruction: only register parameter

  • Immediate operand instruction: only literal parameter

  • Memory operand instruction: only memory address parameter

  • Register-Register operand instruction: two register parameters

  • Register-Immediate operand instruction: one register parameter, one literal parameter

  • Immediate-Register operand instruction: one literal parameter, one register parameter

  • Register-Memory operand instruction: one register parameter, one memory address parameter

  • Memory-Register operand instruction: one memory address parameter, one register parameter

  • Memory-Memory operand instruction: two memory address parameters

  • Immediate-Immediate operand instruction: two literal parameters

  • Memory-Immediate operand instruction: one memory address parameter, one literal parameter

  • Register-Register-Register operand instruction: three register parameters

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
              20          1a          18      10            09    06          00
Default       | register  | register  | register  |         | typ | operator  |
              | register  | register  |                     | typ | operator  |
              | register  |                                 | typ | operator  |
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |
Register      | register  | flags                           | typ | operator  |
Immediate     | literal                       | flags       | typ | operator  |
RR            | register  | register  | flags               | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |
IR            | register  | literal                     |   | typ | operator  |
II            | literal       | literal       | flags       | typ | operator  |
RRR           | register  | register  | register  | flags   | typ | operator  |
RRI           | register  | register  | literal       |flags| typ | operator  |
RIR           | register  | register  | literal       |flags| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction pattern, distinguish by instruction type
* RRI and RIR are two variant of same instruction pattern, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
              20          1a          18      10            09    06          00
Default       | register  | register  | register  |         | typ | operator  |
              | register  | register  |                     | typ | operator  |
              | register  |                                 | typ | operator  |
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |
Register      | register  | flags                           | typ | operator  |
Immediate     | literal                       | flags       | typ | operator  |
RR            | register  | register  | flags               | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |
IR            | register  | literal                     |   | typ | operator  |
II            | literal       | literal       | flags       | typ | operator  |
RRR           | register  | register  | register  | flags   | typ | operator  |
RRI           | register  | register  | literal       |flags| typ | operator  |
RIR           | register  | register  | literal       |flags| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction pattern, distinguish by instruction type
* RRI and RIR are two variant of same instruction pattern, distinguish by instruction type
2.1.1.17.8.1  Instruction Set [IS]

The instruction set includes following instructions:

  • Interrupt and Exception Handling

    • Int Int : interrupt invoke instruction

      • I, :int idx :int idx : invoke interrupt with index idx idx
      • R, :int reg :int reg : invoke interrupt with address stored in register reg reg
  • Snapshot Exception Handling

    • Snap Snap : snapshot exception invoke instruction

      • :snap :snap : invoke snapshot exception
    • Raise Raise : raise exception instruction

      • I, :raise code :raise code : raise exception with code code code
  • Data Management

    • Mov Mov : move data instruction

      • RR, :mov s dst, shl d(src) :mov s dst, shl d(src) : move data from src src to dst dst , shift left by shl shl bits, padding with 0 or 1 by + + or - - .
      • RI, RI, :mov offset dst, val :mov offset dst, val : move immediate value val val to dst dst , offset can be low16 low16 , high16 high16 , low16h low16h , high16h high16h for low 16 or high 16 bits in totally low 32 bits of dst dst or low 16 or high 16 bits in totally high 32 bits of dst dst . E.g., :mov l Reg#1, 0xffff :mov l Reg#1, 0xffff assigns low 16 bits of Reg#1 Reg#1 to 0xffff 0xffff , :mov l Reg#1, 0x8fff :mov l Reg#1, 0x8fff assigns low 16 bits of Reg#1 Reg#1 to 0x8fff 0x8fff , with other version of :mov :mov
      • RR, :mov offset dst, ptr[src] :mov offset dst, ptr[src] : deference memory address src src and move data to dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for target data offset E.g., mov ah, ptr[rbx] mov ah, ptr[rbx] in x86_64 assembly can be represented as :mov 1 ah, bytes ptr[Reg#1] :mov 1 ah, bytes ptr[Reg#1] , meanwhile mov eax, ptr[rbx] mov eax, ptr[rbx] can be represented as :mov 0 eax, dword ptr[Reg#1] :mov 0 eax, dword ptr[Reg#1]
      • RR, :mov ptr[dst], offset src :mov ptr[dst], offset src : move data from src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for source data offset E.g., mov ptr[rbx], ah mov ptr[rbx], ah in x86_64 assembly can be represented as :mov 1[Reg#1], 1 ah :mov 1[Reg#1], 1 ah , meanwhile mov ptr[rbx], eax mov ptr[rbx], eax can be represented as :mov 4[Reg#1], 0 eax :mov 4[Reg#1], 0 eax
      • RR, :mov ptr [dst], [src] :mov ptr [dst], [src] : move data from memory address src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size
      • RRI, :mov dst, ptr[base + offset] :mov dst, ptr[base + offset] : move data from memory address calculated by base base register plus immediate offset offset to dst dst
      • RIR, :mov ptr[base + offset], src :mov ptr[base + offset], src : move data from src src to memory address calculated by base base register plus immediate offset offset
    • LSD LSD : load / save data instruction

      • I, :lsd op idx :lsd op idx : load or save data between global data stack and register Reg#A Reg#A with index idx idx

      op op can be one of following:

      • load load : load data from global data stack to Reg#A Reg#A
      • save save : save data from Reg#A Reg#A to global data stack
      • loadr loadr : load data from global data stack to Reg#R Reg#R
      • saver saver : save data from Reg#R Reg#R to global data stack
      • loadc loadc : load pre-defined data to Reg#A Reg#A
  • Arithmetic Computation

    • OpI OpI : arithmetic integer computation

      • RR, :opi op dst, src :opi op dst, src : perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RI, :opi op dst, val :opi op dst, val : perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • IR, :opi op val, src :opi op val, src : perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op dst, ptr[src] :opi op dst, ptr[src] : perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op ptr[dst], src :opi op ptr[dst], src : perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op prt[dst], [src] :opi op prt[dst], [src] : perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      op op can be one of following:

      • add add : addition
      • sub sub : subtraction
      • mul mul : multiplication
      • div div : division
    • OpU OpU : arithmetic unsigned integer computation

      • RR, :opi op dst, src :opi op dst, src : perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RI, :opi op dst, val :opi op dst, val : perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • IR, :opi op val, src :opi op val, src : perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op dst, ptr[src] :opi op dst, ptr[src] : perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op ptr[dst], src :opi op ptr[dst], src : perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op prt[dst], [src] :opi op prt[dst], [src] : perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      op op can be one of following:

      • add add : addition
      • sub sub : subtraction
      • mul mul : multiplication
      • div div : division
    • OpF OpF : arithmetic floating-point computation

      • RR, :opf op dst, src :opf op dst, src : perform arithmetic operation op op on floating-point dst dst and src src , store result into Reg#A Reg#A
      • RR, :opf op dst, ptr[src] :opf op dst, ptr[src] : perform arithmetic operation op op on floating-point dst dst and memory address src src , store result into Reg#A Reg#A
      • RR, :opf op ptr[dst], src :opf op ptr[dst], src : perform arithmetic operation op op on floating-point memory address dst dst and floating-point src src , store result into Reg#A Reg#A
      • RR, :opf op ptr[dst], [src] :opf op ptr[dst], [src] : perform arithmetic operation op op on floating-point memory address dst dst and floating-point memory address src src , store result into Reg#A Reg#A
      • RR, :opf fmod dst, src :opf fmod dst, src : perform floating-point modulus operation on dst dst and src src , store result into Reg#A Reg#A
      • RR, :opf fmod ptr [dst], [src] :opf fmod ptr [dst], [src] : perform floating-point modulus operation on memory address dst dst and memory address src src , store result into Reg#A Reg#A

      op op can be one of following:

      • fadd fadd : floating-point addition
      • fsub fsub : floating-point subtraction
      • fmul fmul : floating-point multiplication
      • fdiv fdiv : floating-point division
    • OpB OpB : arithmetic bitwise computation

      • RR, :opb op dst, src :opb op dst, src : perform arithmetic operation op op on bitwise dst dst and src src , store result into Reg#A Reg#A
      • RI, :opb op dst, val :opb op dst, val : perform arithmetic operation op op on bitwise dst dst and immediate value val val , store result into Reg#A Reg#A
      • RR, :opb op dst, ptr[src] :opb op dst, ptr[src] : perform arithmetic operation op op on bitwise dst dst and memory address src src , store result into Reg#A Reg#A
      • RR, :opb op ptr[dst], src :opb op ptr[dst], src : perform arithmetic operation op op on bitwise memory address dst dst and bitwise src src , store result into Reg#A Reg#A
      • RR, :opb op ptr[dst], [src] :opb op ptr[dst], [src] : perform arithmetic operation op op on bitwise memory address dst dst and bitwise memory address src src , store result into Reg#A Reg#A
      • RI, :opb op ptr[dst], val :opb op ptr[dst], val : perform arithmetic operation op op on bitwise memory address dst dst and immediate value val val , store result into Reg#A Reg#A

      op op can be one of following:

      • and and : bitwise AND
      • or or : bitwise OR
      • xor xor : bitwise XOR
      • not not : bitwise NOT
    • OpS OpS : arithmetic shift computation

      • RR, :ops op dst, src :ops op dst, src : perform shift operation op op on dst dst by src src bits, store result into Reg#A Reg#A
      • RR, :ops op dst, ptr[src] :ops op dst, ptr[src] : perform shift operation op op on dst dst by memory address src src bits, store result into Reg#A Reg#A
      • RR, :ops op ptr[dst], src :ops op ptr[dst], src : perform shift operation op op on memory address dst dst by src src bits, store result into Reg#A Reg#A
      • RR, :ops op ptr[dst], [src] :ops op ptr[dst], [src] : perform shift operation op op on memory address dst dst by memory address src src bits, store result into Reg#A Reg#A

      op op can be one of following:

      • shl shl : shift left
      • shr shr : shift right
      • sal sal : shift arithmetic left
      • sar sar : shift arithmetic right
      • rol rol : rotate left
      • ror ror : rotate right
      • rcl rcl : rotate through carry left
      • rcr rcr : rotate through carry right
  • Condition Test and Branch

    • Test Test : condition test instruction

      • II, :text cond, jmp :text cond, jmp : test condition cond cond , if true, jump to near address with offset jmp jmp
  • Control Flow Jump

    • Jmp Jmp : control flow jump instruction

      • I, :jmp:near offset :jmp:near offset : jump to near address dst dst with offset
      • R, :jmp:short dst :jmp:short dst : jump to short address stored in register dst dst
      • RI, :jmp:far segment : offset :jmp:far segment : offset : jump to far address offset offset in function unit segment segment
  • Loop Control

    • Loop Loop : loop control instruction

      • I, :loop offset :loop offset : decrement Reg#C Reg#C by one, if not zero, jump to near address with offset
  • Function Call and Return

    • Call Call : function call instruction

      • I, :call idx :call idx : call function with index idx idx in function unit vector
      • R, :call dst :call dst : call function with address stored in register dst dst
    • Ret Ret : function return instruction

      • :ret :ret : return from current function
    • IRet IRet : interrupt return instruction

      • :iret :iret : return from interrupt
    • RegF RegF : register new function instruction

      • RR, :regf skip, len :regf skip, len : register new function unit with code length len len in bytes, skip first skip skip bytes in global data stack
  • Stack Management

    • Stack Stack : stack management instruction

      • I, :stack alloc size :stack alloc size : allocate stack space with size size size bytes
      • I, :stack free size :stack free size : free stack space with size size size bytes
      • Zero, :stack clear :stack clear : clear current function stack frame
      • Zero, :stack dump :stack dump : dump current stack frame information for debugging
      • Zero, :stack create :stack create : create a new function stack frame
      • Zero, :stack destroy :stack destroy : destroy current function stack frame and return to previous function stack frame
      • I, :stack duplicate idx :stack duplicate idx : duplicate current stack segment, and update Reg#SS Reg#SS to point to new stack segment, store previous stack segment pointer into global data stack at index idx idx
      • I, :stack restore idx :stack restore idx : restore previous stack segment from global data stack at index idx idx , and update Reg#SS Reg#SS to point to restored stack segment
    • Push Push : push data onto stack instruction

      • R, :push src :push src : push data from register src src onto stack
      • I, :push val :push val : push immediate value val val onto stack
      • R, :push ptr[src] :push ptr[src] : push data from memory address src src onto stack
    • Pop Pop : pop data from stack instruction

      • R, :pop dst :pop dst : pop data from stack into register dst dst
      • R, :pop ptr[dst] :pop ptr[dst] : pop data from stack into memory address dst dst
2.1.1.17.8.2  Instruction: Int Int Int Int [/notes/d_flat/Turing/Instruction/Int]">[Int]

Int Int instruction is used to invoke interrupt.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

There are two variant of Int Int instruction:

  • Register variant:

    • Syntax: :int reg :int reg
    • Description: invoke interrupt with address stored in register reg reg
  • Immediate variant:

    • Syntax: :int idx :int idx
    • Description: invoke interrupt with index idx idx

No flags used.

Basically, int int instruction will save current execution status, and jump to interrupt handler function unit. All registers will be pushed into global data stack. The interrupt handler function unit will return to previous execution status by iret iret instruction.

Apart from 6 bits operator code and 3 bits type code, the rest 17 bits in R case, and rest 7 bits in I case must be 0. Otherwise, invalid instruction exception will be raised automatically.

The idx idx in immediate variant is a 15 bits unsigned integer. If idx idx larger than 0xff 0xff , invalid interrupt exception will be raised automatically.

There are some pre-defined interrupt index:

  • 0x00 0x00 : Exception Interrupt
  • 0x01 0x01 : System Call Interrupt
  • 0xff 0xff : Halt Interrupt

For case with register variant, the value in register must be aligned to 4 bytes. With unaligned address will raise invalid interrupt exception automatically. Treat the value in register as address in global data stack segment. And invoke interrupt handler function unit from that address.

2.1.1.17.8.3  Instruction: Snap Snap Snap Snap [/notes/d_flat/Turing/Instruction/Snap]">[Snap]

Snap Snap instruction is used to invoke snapshot exception.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          |                                           |f| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          |                                           |f| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Snap Snap instruction have no parameter.

  • Syntax: :snap flags :snap flags
  • Description: invoke snapshot exception

flags flags in syntax can be full full or light light , indicating full snapshot or light snapshot. If flags is omitted, full full is assumed. If flags is full full , the flag bit f f is set to 1 1 , otherwise 0 0 . If light snapshot invoked, only register records will be snapshotted.

Basically, snap snap instruction will duplicate current global data stack segment, execution stack segment, and register records. Then snapshot exception may be handled by exception handler function unit. Snapshot restore must be handled by user program explicitly.

Apart from 6 bits operator code and 3 bits type code, and 1 bits flag f f , the rest 22 bits must be 0. Otherwise, invalid instruction exception will be raised automatically.

2.1.1.17.8.4  Instruction: Raise Raise Raise Raise [/notes/d_flat/Turing/Instruction/Raise]">[Raise]

Raise Raise instruction is used to raise exception.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Raise Raise instruction have one immediate parameter.

  • Syntax: :raise code :raise code
  • Description: raise exception with code code code

No flag used.

Basically, raise raise instruction will invoke exception handler function unit. Exception code will be passed to exception handler function unit via Reg#FLAGS Reg#FLAGS . Exception handler function unit may return to previous execution status by iret iret instruction. Exception handler is a special interrupt handler.

There are some pre-defined exception code:

  • 0x00 0x00 : General Exception
  • 0x01 0x01 : Invalid Instruction Exception
  • 0x02 0x02 : Invalid Operand Exception
  • 0x03 0x03 : Invalid Variation Exception
  • 0x04 0x04 : Arithmetic Exception
  • 0x05 0x05 : Division by Zero Exception
  • 0x06 0x06 : Shift Count Exception
  • 0x07 0x07 : Arithmetic Overflow Exception
  • 0x08 0x08 : Invalid Interrupt Exception
  • 0x09 0x09 : Invalid Function Call Exception
  • 0x0A 0x0A : Invalid Parameter Exception
  • 0x0B 0x0B : Invalid Memory Access Exception
  • 0x0C 0x0C : Invalid Segment Access Exception
  • 0x0D 0x0D : Invalid Register Access Exception
  • 0x0E 0x0E : Stack Overflow Exception
  • 0x0F 0x0F : Stack Underflow Exception
  • 0x10 0x10 : Invalid Register Access Exception
  • 0x11 0x11 : Snapshot Restore Exception
  • 0x12 0x12 : Snapshot Exception
2.1.1.17.8.5  Instruction: Mov Mov Mov Mov [/notes/d_flat/Turing/Instruction/Mov]">[Mov]

Mov Mov instruction is used to move data between registers and memory.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |k| l |d| shl       |s| typ | operator  |
RR(1)         | register  | register  |             |o  |ss | typ | operator  |
RI            | register  | literal                     |o  | typ | operator  |
IR            | register  | literal                     |o  | typ | operator  |
RRI           | register  | register  | literal       | |ss | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |k| l |d| shl       |s| typ | operator  |
RR(1)         | register  | register  |             |o  |ss | typ | operator  |
RI            | register  | literal                     |o  | typ | operator  |
IR            | register  | literal                     |o  | typ | operator  |
RRI           | register  | register  | literal       | |ss | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Mov Mov instruction have following variants:

  • Register-Register variant:

    • Syntax: :mov s dst, shl d(src) :mov s dst, shl d(src)
    • Type code: 0, RR RR
    • Description: move data from src src to dst dst , shift left by shl shl bits, padding with 0 or 1 by + + or - - . + + is default if omitted.

      • dst dst : target register
      • src src : source register
      • shl shl : shift left bits, from 0 0 to 63 63 , optional, default is 0 0 if omitted
      • s s : + + or - - , padding with 0 0 or 1 1 , optional, if provided, manual padding
      • d d : shift direction and type, optional, default is left logical shift if omitted < < for shift left logical, > > for shift right logical, >> >> for shift right arithmetic, rol rol for roll left, ror ror for roll right, lp lp for manual padding shift
    • Flags:

      • s s : padding bit, 0 0 for + + , 1 1 for - - , available only when l is 11 11
      • d d : shift direction, 0 0 for left, 1 1 for right
      • l l : shift type code

        • 00 00 : logical shift
        • 01 01 : arithmetic shift
        • 10 10 : roll shift
        • 11 11 : manual padding shift
      • shl shl : 6 bits shift left bits
      • k k : short process flag, if read as 0 0 , treat as shift left logical with shl shl bits shift.
  • Register-Immediate variant:

    • Syntax: :mov offset dst, val :mov offset dst, val
    • Type code: 1 / 2, RI RI / RI RI , if literal have 16th bit set, use second type code, otherwise use first type code
    • Description: move immediate value val val to dst dst , offset can be low16 low16 , high16 high16 , low16h low16h , high16h high16h for low 16 or high 16 bits in totally low 32 bits of dst dst or low 16 or high 16 bits in totally high 32 bits of dst dst .

      • dst dst : target register
      • val val : immediate value
      • offset offset : target offset
    • Flags:

      • o o : 2 bits offset code

        • 00 00 : low 16
        • 01 01 : high 16
        • 10 10 : low 16h
        • 11 11 : high 16h
  • Register-Address(Register) variant:

    • Syntax: :mov ptr[dst], offset src :mov ptr[dst], offset src
    • Type code: 3, RR(1) RR(1)
    • Description: move data from src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for source data offset

      • dst dst : target memory address register
      • src src : source register
      • ptr ptr : data size
      • offset offset : source data offset
    • Flags:

      • ss ss : 2 bits source data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
      • o o : 2 bits source data offset

        • 00 00 : 0
        • 01 01 : 1
        • 10 10 : 2
        • 11 11 : 4
  • Address(Register)-Register variant:

    • Syntax: :mov offset dst, ptr[src] :mov offset dst, ptr[src]
    • Type code: 4, RR(2) RR(2)
    • Description: deference memory address src src and move data to dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for target data offset

      • dst dst : target register
      • src src : source memory address register
      • ptr ptr : data size
      • offset offset : target data offset
    • Flags:

      • ss ss : 2 bits target data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
      • o o : 2 bits target data offset

        • 00 00 : 0
        • 01 01 : 1
        • 10 10 : 2
        • 11 11 : 4
  • Address(Register)-Address(Register) variant:

    • Syntax: :mov ptr [dst], [src] :mov ptr [dst], [src]
    • Type code: 5, RR RR
    • Description: move data from memory address src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • ptr ptr : data size
    • Flags:

      • ss ss : 2 bits data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Register-Address(Register + Immediate) variant:

    • Syntax: :mov dst, ptr[base + offset] :mov dst, ptr[base + offset]
    • Type code: 6, RRI RRI
    • Description: move data from memory address calculated by base base register plus immediate offset offset to dst dst

      • dst dst : target register
      • base base : base register for memory address calculation
      • offset offset : immediate offset for memory address calculation
    • Flags: none
  • Address(Register + Immediate)-Register variant:

    • Syntax: :mov ptr[base + offset], src :mov ptr[base + offset], src
    • Type code: 7, RIR RIR
    • Description: move data from src src to memory address calculated by base base register plus immediate offset offset

      • src src : source register
      • base base : base register for memory address calculation
      • offset offset : immediate offset for memory address calculation
    • Flags: none

All other combination of source and target operand are invalid for Mov Mov instruction.

Basically, mov mov instruction copies data from source operand to target operand directly.

The whole mov mov instruction family can be divided into three categories:

  • register to register move, including register-register variant, simply copy data between registers, with optional shift operation. Shift operation may be applied during data move. Depend on shift type and direction, data in source register will be shifted left or right by specified bits, and then moved to target register. For most case without shift operation, data in source register is copied directly to target register. dst = src; dst = src; For rest case with default shift operation, data int source register is shifted left logically by specified bits, and then moved to target register. dst = src << shl; dst = src << shl; For rest case with specified shift operation, data in source register is shifted depend on shift type and direction, and then moved to target register. By register to register move, the user can simulate 32 bits general-purposed register, like in Risc-V or x86_64 architecture.
  • immediate to register move, including register-immediate variant I and II, move immediate value to target register directly. The immediate value have 15 bits stored in instruction, the 16th bit distinguished by type code. For register-immediate variant I, 16th bit is 1 1 , for register-immediate variant II, 16th bit is 0 0 . The immediate value can be assigned to corresponding double-word in target register depend on offset parameter. Since the flags have 2 bits offset code, all four double-words in target register can be assigned separately.
  • register to memory, memory to register and memory to memory move, including RR, RR, RR, RR, RRI, RIR variants, Addressing using register, or register plus immediate offset. Basically read value in register and or add immediate to the value, addressing global data stack using the value as memory address. Data size and data offset must be specified by flags. For data size:

    • 0 means qword (8 bytes)
    • 1 means bytes (1 byte)
    • 2 means word (2 bytes)
    • 3 means dword (4 bytes)

    For data offset:

    • 0 means offset 0
    • 1 means offset 1
    • 2 means offset 2
    • 3 means offset 4

    RR variant with Addressing(Register)-Addressing(Register) must have o with 0 For RR(A(R)R) or RRI variant, Read ss bytes data from source memory address and write ss bytes data to target register with offset o. For RR(RA(R)) or RIR variant, Read ss bytes data from source register with offset o and write ss bytes data to target memory address. For RR(A(R)A(R)) variant, Read ss bytes data from source memory address and write ss bytes data to target memory address.

2.1.1.17.8.6  Instruction: LSD LSD LSD LSD [/notes/d_flat/Turing/Instruction/LSD]">[LSD]

LSD LSD instruction is used to load or save data between global data stack and register Reg#A Reg#A or Reg#R Reg#R .

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The LSD LSD instruction have one immediate parameter.

  • Syntax: :lsd op idx :lsd op idx
  • Description: load or save data between global data stack and register Reg#A Reg#A with index idx idx

    • op op : operation type, can be one of following:

      • load load : load data from global data stack to Reg#A Reg#A
      • save save : save data from Reg#A Reg#A to global data stack
      • loadr loadr : load data from global data stack to Reg#R Reg#R
      • saver saver : save data from Reg#R Reg#R to global data stack
      • loadc loadc : load pre-defined data to Reg#A Reg#A
  • Flags:

    • typ typ : 3 bits operation type code

      • 000 000 : load
      • 001 001 : save
      • 010 010 : load into Reg#R Reg#R
      • 011 011 : save from Reg#R Reg#R
      • 100 100 : load constant

For the case of load load operation, data is loaded from global data stack with index idx idx into target register. For the case of save save operation, data is saved from register Reg#A Reg#A into global data stack with idx idx For the case of loadr loadr operation, data is loaded from global data stack with index idx idx into target register Reg#R Reg#R . For the case of saver saver operation, data is saved from register Reg#R Reg#R into global data stack with idx idx . For the case of loadc loadc operation, pre-defined data with index idx idx is loaded into target register Reg#A Reg#A . Index idx idx can be:

  • 0 0 : unsigned 64-bit integer 0 0
  • 1 1 : unsigned 64-bit integer maximum value
  • 2 2 : unsigned 64-bit integer minimum value
  • 3 3 : signed 64-bit integer 0 0
  • 4 4 : signed 64-bit integer maximum value
  • 5 5 : signed 64-bit integer minimum value
  • 6 6 : IEEE 754 double-precision floating-point 0.0 0.0
  • 7 7 : IEEE 754 double-precision floating-point maximum value
  • 8 8 : IEEE 754 double-precision floating-point minimum value
  • 9 9 : IEEE 754 double-precision floating-point Not-a-Number (NaN)
  • 10 10 : IEEE 754 double-precision floating-point positive infinity
  • 11 11 : IEEE 754 double-precision floating-point negative infinity
  • 12 12 : IEEE 754 single-precision floating-point 0.0 0.0
  • 13 13 : IEEE 754 single-precision floating-point maximum value
  • 14 14 : IEEE 754 single-precision floating-point minimum value
  • 15 15 : IEEE 754 single-precision floating-point Not-a-Number (NaN)
  • 16 16 : IEEE 754 single-precision floating-point positive infinity
  • 17 17 : IEEE 754 single-precision floating-point negative infinity
  • 18 18 : boolean true true
  • 19 19 : boolean false false
  • 20 20 : character '\0' '\0'
  • 21 21 : character maximum value
  • 22 22 : character minimum value
  • 23 23 : null pointer

Basically, lsd lsd instruction provides a simple way to load or save data between global data stack and register Reg#A Reg#A or Reg#R Reg#R .

2.1.1.17.8.7  Instruction: OpI OpI OpI OpI [/notes/d_flat/Turing/Instruction/OpI]">[OpI]

OpI OpI instruction is used to perform arithmetic integer computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpI OpI instruction have following variants:

  • Register-Register variant:

    • Syntax: :opi op dst, src :opi op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Immediate variant:

    • Syntax: :opi op dst, val :opi op dst, val
    • Type code: 1, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Immediate-Register variant:

    • Syntax: :opi op val, src :opi op val, src
    • Type code: 2, IR IR , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • src src : source register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Address(Register) variant:

    • Syntax: :opi op dst, ptr[src] :opi op dst, ptr[src]
    • Type code: 3, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Register variant:

    • Syntax: :opi op ptr[dst], src :opi op ptr[dst], src
    • Type code: 4, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits target data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Address(Register) variant:

    • Syntax: :opi op prt[dst], [src] :opi op prt[dst], [src]
    • Type code: 5, RR RR
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword

OpI OpI support following arithmetic operations:

  • add add : addition
  • sub sub : subtraction
  • mul mul : multiplication
  • div div : division

After OpI OpI instruction executed, original Reg#A Reg#A and Reg#R Reg#R values are overwritten.

Basically, OpI OpI instruction performs arithmetic operation on integer data. And store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R . For addition operation, dst dst treated as addend, src src treated as addor, For subtraction operation, dst dst treated as minuend, src src treated as subtrahend, For multiplication operation, dst dst treated as multiplicand, src src treated as multiplier, For division operation, dst dst treated as dividend, src src treated as divisor.

2.1.1.17.8.8  Instruction: OpU OpU OpU OpU [/notes/d_flat/Turing/Instruction/OpU]">[OpU]

OpU OpU instruction is used to perform arithmetic integer computation, treat as unsigned integer.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpU OpU instruction have following variants:

  • Register-Register variant:

    • Syntax: :opu op dst, src :opu op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Immediate variant:

    • Syntax: :opu op dst, val :opu op dst, val
    • Type code: 1, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Immediate-Register variant:

    • Syntax: :opu op val, src :opu op val, src
    • Type code: 2, IR IR , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • src src : source register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Address(Register) variant:

    • Syntax: :opu op dst, ptr[src] :opu op dst, ptr[src]
    • Type code: 3, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Register variant:

    • Syntax: :opu op ptr[dst], src :opu op ptr[dst], src
    • Type code: 4, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits target data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Address(Register) variant:

    • Syntax: :opu op prt[dst], [src] :opu op prt[dst], [src]
    • Type code: 5, RR RR
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword

OpI OpI support following arithmetic operations:

  • add add : addition
  • sub sub : subtraction
  • mul mul : multiplication
  • div div : division

After OpU OpU instruction executed, original Reg#A Reg#A and Reg#R Reg#R values are overwritten.

Basically, OpU OpU instruction performs arithmetic operation on integer data. And store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R . For addition operation, dst dst treated as addend, src src treated as addor, For subtraction operation, dst dst treated as minuend, src src treated as subtrahend, For multiplication operation, dst dst treated as multiplicand, src src treated as multiplier, For division operation, dst dst treated as dividend, src src treated as divisor.

2.1.1.17.8.9  Instruction: OpF OpF OpF OpF [/notes/d_flat/Turing/Instruction/OpF]">[OpF]

OpF OpF instruction is used to perform arithmetic floating-point computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |             |ss | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |             |ss | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpF OpF instruction have following variants:

  • Register-Register variant:

    • Syntax: :opf op dst, src :opf op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on floating-point dst dst and src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Register-Address(Register) variant:

    • Syntax: :opf op dst, ptr[src] :opf op dst, ptr[src]
    • Type code: 1, RR(1) RR(1)
    • Description: perform arithmetic operation op op on floating-point dst dst and memory address src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), dword dword ( 1 1 ), float128 float128 ( 2 2 ) for data size

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Address(Register)-Register variant:

    • Syntax: :opf op ptr[dst], src :opf op ptr[dst], src
    • Type code: 2, RR(1) RR(1)
    • Description: perform arithmetic operation op op on floating-point memory address dst dst and floating-point src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), dword dword ( 1 1 ), float128 float128 ( 2 2 ) for data size

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits target data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Address(Register)-Address(Register) variant:

    • Syntax: :opf op ptr[dst], [src] :opf op ptr[dst], [src]
    • Type code: 3, RR RR
    • Description: perform arithmetic operation op op on floating-point memory address dst dst and floating-point memory address src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), dword dword ( 1 1 ), float128 float128 ( 2 2 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Register-Register Modulus variant:

    • Syntax: :opf fmod dst, src :opf fmod dst, src
    • Type code: 4, RR RR
    • Description: perform floating-point modulus operation on dst dst and src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
    • Flags:

      • ss ss : 2 bits source data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Address(Register)-Address(Register) Modulus variant:

    • Syntax: :opf fmod ptr [dst], [src] :opf fmod ptr [dst], [src]
    • Type code: 5, RR RR
    • Description: perform floating-point modulus operation on memory address dst dst and memory address src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R

      • dst dst : target memory address register
      • src src : source memory address register
    • Flags:

      • ss ss : 2 bits data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128

If ss ss is 10 10 , float 128 operation is performed, and Reg#R Reg#R store high 64 bits of result. If ss ss is 10 10 , type code must be 3 or 5, only when both operands are memory address.

After OpF OpF instruction executed, original Reg#A Reg#A and Reg#R Reg#R values are overwritten.

2.1.1.17.8.10  Instruction: OpB OpB OpB OpB [/notes/d_flat/Turing/Instruction/OpB]">[OpB]

OpB OpB instruction is used to perform bitwise computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 |op | typ | operator  |
RI            | register  | literal                     |op | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 |op | typ | operator  |
RI            | register  | literal                     |op | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpB OpB instruction have following variants:

  • Register-Register variant:

    • Syntax: :opb op dst, src :opb op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on bitwise dst dst and src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Immediate variant:

    • Syntax: :opb op dst, val :opb op dst, val
    • Type code: 1, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on bitwise dst dst and immediate value val val , store result into Reg#A Reg#A

      • dst dst : target register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Address(Register)-Register variant:

    • Syntax: :opb op ptr[dst], src :opb op ptr[dst], src
    • Type code: 2, RR RR
    • Description: perform arithmetic operation op op on bitwise memory address dst dst and bitwise src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Address(Register) variant:

    • Syntax: :opb op dst, ptr[src] :opb op dst, ptr[src]
    • Type code: 3, RR RR
    • Description: perform arithmetic operation op op on bitwise dst dst and bitwise memory address src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Address(Register)-Address(Register) variant:

    • Syntax: :opb op ptr[dst], [src] :opb op ptr[dst], [src]
    • Type code: 4, RR RR
    • Description: perform arithmetic operation op op on bitwise memory address dst dst and bitwise memory address src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Address(Register)-Immediate variant:

    • Syntax: :opb op ptr[dst], val :opb op ptr[dst], val
    • Type code: 5, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on bitwise memory address dst dst and immediate value val val , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code

OpB OpB support following bitwise operations:

  • and and : bitwise AND
  • or or : bitwise OR
  • xor xor : bitwise XOR
  • not not : bitwise NOT

For the case op op is not not , operation result of performance to dst dst will be stored into Reg#A Reg#A , and other will be written to Reg#R Reg#R . For all other operations, result will be stored into Reg#A Reg#A , and Reg#R Reg#R is not modified.

2.1.1.17.8.11  Instruction: OpS OpS OpS OpS [/notes/d_flat/Turing/Instruction/OpS]">[OpS]

OpS OpS instruction is used to perform shift computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |               | opt | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |               | opt | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpS OpS instruction have following variants:

  • Register-Register variant:

    • Syntax: :ops op dst, src :ops op dst, src
    • Type code: 0, RR RR
    • Description: perform shift operation op op on dst dst by bits in src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
  • Register-Address(Register) variant:

    • Syntax: :ops op dst, ptr[src] :ops op dst, ptr[src]
    • Type code: 1, RR RR
    • Description: perform shift operation op op on dst dst by bits in memory address src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source memory address register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
  • Address(Register)-Register variant:

    • Syntax: :ops op ptr[dst], src :ops op ptr[dst], src
    • Type code: 2, RR RR
    • Description: perform shift operation op op on memory address dst dst by bits in src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
  • Address(Register)-Address(Register) variant:

    • Syntax: :ops op ptr[dst], [src] :ops op ptr[dst], [src]
    • Type code: 3, RR RR
    • Description: perform shift operation op op on memory address dst dst by bits in memory address src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
2.1.1.17.8.12  Instruction: Test Test Test Test [/notes/d_flat/Turing/Instruction/Test]">[Test]

Test Test instruction is used to test condition and jump to target address if condition is met.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
II            | literal       | literal       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
II            | literal       | literal       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use

The Test Test instruction have one immediate parameter.

  • Syntax: :test cond, addr :test cond, addr
  • Description: test condition cond cond , if met, jump to target address addr addr

    • cond cond : condition to be tested
    • addr addr : target address to jump to if condition is met
  • Flags: none

cond cond are integer indeed, can be written as following to prevent confusion:

  • Test#e Test#e , 0, equal, zero flag is set
  • Test#g Test#g , 1, greater, not equal and sign flag equals overflow flag
  • Test#ng Test#ng , 2, not greater, equal or sign flag not equals overflow flag
  • Test#l Test#l , 3, less, sign flag not equals overflow flag
  • Test#nl Test#nl , 4, not less, sign flag equals overflow flag
  • Test#o Test#o , 5, overflow, overflow flag is set
  • Test#no Test#no , 6, not overflow, overflow flag is not set
  • Test#c Test#c , 7, carry, carry flag is set
  • Test#nc Test#nc , 8, not carry, carry flag is not set
  • Test#z Test#z , 9, zero, zero flag is set
  • Test#nz Test#nz , 10, not zero, not zero flag is set
  • Test#s Test#s , 11, sign, sign flag is set
2.1.1.17.8.13  Instruction: Jmp Jmp Jmp Jmp [/notes/d_flat/Turing/Instruction/Jmp]">[Jmp]

Jmp Jmp instruction is used to jump to target address unconditionally.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use

The Jmp Jmp instruction have following variants:

  • Register variant:

    • Syntax: :jmp short dst :jmp short dst
    • Type code: 0, R
    • Description: jump to target address in register dst dst

      • dst dst : target register
    • Flags: none
  • Immediate variant:

    • Syntax: :jmp near offset :jmp near offset
    • Type code: 1, I
    • Description: jump to address offset with offset offset from function entry point.

      • offset offset : target address offset
    • Flags: none
  • Register-Immediate variant:

    • Syntax: :jmp far dst, offset :jmp far dst, offset
    • Type code: 2, RI
    • Description: jump to function with index dst dst in function unit vector, plus address offset offset offset

      • dst dst : target register
      • offset offset : immediate offset
    • Flags: none

Basically, jmp jmp instruction provides a way to jump to target address unconditionally. Used for control flow transfer in program execution.

2.1.1.17.8.14  Instruction: Loop Loop Loop Loop [/notes/d_flat/Turing/Instruction/Loop]">[Loop]

Loop Loop instruction is used to perform loop operation with counter register.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.

The Loop Loop instruction have no parameters.

  • Syntax: :loop addr :loop addr
  • Description: decrement counter register Reg#C Reg#C , if not zero, jump to target address addr addr

    • addr addr : target address to jump to if counter not zero
  • Flags: none

Loop like x86 loop loop instruction, decrement counter register Reg#C Reg#C by 1.

2.1.1.17.8.15  Instruction: Call Call Call Call [/notes/d_flat/Turing/Instruction/Call]">[Call]

Call Call instruction is used to call function at target address.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Call Call instruction have following variants:

  • Register variant:

    • Syntax: :call dst :call dst
    • Type code: 0, R
    • Description: call function at target address in register dst dst

      • dst dst : target register
    • Flags: none
  • Immediate variant:

    • Syntax: :call idx :call idx
    • Type code: 1, I
    • Description: call function with index idx idx in function unit vector

      • idx idx : target index
    • Flags: none

Basically, call call instruction provides a way to call function. All necessary function call setup must be done before call call instruction executed. Top of execution stack always trace both pointer to function and the execution status of the function. Thus return address is stored automatically when call call instruction executed. Call instruction pushes new execution context onto execution stack.

2.1.1.17.8.16  Instruction: Ret Ret Ret Ret [/notes/d_flat/Turing/Instruction/Ret]">[Ret]

Ret Ret instruction is used to return from function call.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use

The Ret Ret instruction have no parameters.

  • Syntax: :ret :ret
  • Description: return from current function call to caller function
  • Flags: none

Basically, ret ret instruction provides a way to return from function call. When ret ret instruction executed, current execution context is popped from execution stack, Reg#PC Reg#PC and Reg#EP Reg#EP restored to caller function's context.

2.1.1.17.8.17  Instruction: IRet IRet IRet IRet [/notes/d_flat/Turing/Instruction/IRet]">[IRet]

IRet IRet instruction is used to return from interruption handler.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use

The IRet IRet instruction have no parameters.

  • Syntax: :iret :iret
  • Description: return from current interruption handler to interrupted context
  • Flags: none

Basically, iret iret instruction provides a way to return from interruption handler. When iret iret instruction executed, register information stored when interruption occurs is restored. Execution stack pop and continues execution of previous executed function.

2.1.1.17.8.18  Instruction: RegF RegF RegF RegF [/notes/d_flat/Turing/Instruction/RegF]">[RegF]

RegF RegF instruction is used to register a new function.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                     | typ | operator  |

* registers: 6 bits register code
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                     | typ | operator  |

* registers: 6 bits register code

The RegF RegF instruction have two parameters.

  • Syntax: :regf skip, len :regf skip, len
  • Description: register a new function with code length len len , skip skip skip bytes after registration

    • skip skip : number of bytes to skip after registration
    • len len : length of function code in bytes
  • Flags: none

The RegF RegF instruction creates a new function unit and assign the text with given data. If skip skip and len len is not aligned to instruction size a invalided instruction exception will be raised.

2.1.1.17.8.19  Instruction: Stack Stack Stack Stack [/notes/d_flat/Turing/Instruction/Stack]">[Stack]

Stack Stack instruction is used to manipulate global data stack.

2.1.2 Tags
2.1.2.1 c
2.1.2.1.1  Stanford CS107: Programming Paradigm [S1]
2.1.2.1.1.1 Data Types and Conversion
2.1.2.1.1.1.1 Binary Numbers

对于正数, 直接相加即可得到结果(在范围内)

对于含负数数, 需要通过一种方式表示它的正负性

  1. 原码: 选取数值的最高位, 0为正1为负.

    直接用最高位为1的数表示, 与正数相加时可能会取得不正确的结果. 对于一个负数, 不能采用通常二进制加法, 简单将最高位置1.

       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)
       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)

    需要保证运算过后, 可以使得负数与对应正数相加值为0(最高位1溢出).

  2. 反码 1's complement: 将数值原样取反.

    正数与绝对值相同的负数相加, 和为全1, 会造成+0和-0问题

       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
  3. 补码 2's complement: 将2中结果+1, 则为所需结果, 对于实用, 将值加到负数中

       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)
       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)

    补码的数学含义: 模数加法构成阿贝尔群: 正整数的加法逆元

2.1.2.1.1.1.2 Characters

字符本身即为数字

2.1.2.1.1.1.3 Convert

小数值的赋值近似直接将对应值赋值到大数值的低位

大数值赋值到小数值空间, 直接抛弃高位

负数赋值会用符号位填充高位(逻辑赋值), 或填0

2.1.2.1.1.1.4 Floats
  1. 定点二进制小数: 采用几个位数表示 2^{-n}

    可以表示的整数和小数的位数一定,

    浮点数, 用以有限位数和精度逼近稠密数域上的精确小数

  2. float 32: IEEE 754 2-based float number

    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]
    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]

    实际上来说, val(10)=(1) sign ×1.baseexp2bits(exp)1+1

2.1.2.1.1.1.5 Endian

最高位所在的字节称为大端,最低位所在的字节称为小端.

小端序: 高位在低字节 大端序: 高位在高字节

大端符合人类阅读习惯

指针指向会被字节序影响

2.1.2.1.1.2 Structure ( struct struct )

指针指向结构的起始地址, 其他元素通过相对于起始地址 (基地址,类似汇编的基地址和偏移地址的关系, 汇编的偏移地址以0x10为基, 此处偏移地址以0x1为基且偏移地址的值相等于之前变量的长度的总和) 的偏移访问.

2.1.2.1.1.2.1 Array

指针指向数组的起始地址, 其他元素通过相对于起始地址的偏移访问. 总体类似于结构, 但是偏移地址的长度等于n倍的元素变量长度

2.1.2.1.1.2.2 Generic

c风格的泛型,

void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}
void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}

相对于模板, c风格的泛型不需要为相同内核的算法生成不同的二进制. 可以规避二进制膨胀问题

lsearch lsearch 参考 [ulibs.c: binsearch_linear](https://github.com/mujiu555/ublis.c)

Example for generic:

c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
2.1.2.1.1.3 Stack
2.1.2.1.1.3.1 Stack with int
2.1.2.1.1.3.2 Generic Stack
2.1.2.1.1.4 Memory Management

若需要在析构泛型栈的同时析构内部元素, 则需要提供释放函数, 以便于析构.

需要确定指针与地址.

2.1.2.1.1.5 Memory Segments

Soft managed memory:

When a program are loaded to memory, the heap part is managed by malloc malloc , relloc relloc , free free .

The memory space allocated for you will contains more bytes just before the head. The meta data information.

Thus, free(head+offset); free(head+offset); is not allowed. For malloc malloc needs meta data, index with offset will lead to crash.

Furthermore, free a array is not allowed, as well. For array are space allocated in stack and managed by compiler. Which also contains no meta data.

Memory manager may spilt memory into segments, and just allocate memory space for you within some specify segment if request less than 2^n bytes.

2.1.2.1.1.5.1 Memory compose

Split a large space of memory to handle memory allocation using handler. Handler are some pointer points to the pointer points to actual memory.

2.1.2.1.1.5.2 Stack segment

Stack depth roughly relative with function call count.

When define a variable or array within a function, like main, it will create stack frame, increase stack top. (Stack increase towards low address). (Similarly, heap increase towards higher address).

Stack top pointer is embedded within stack and split the stack and gap. (Gap is the space between heap and stack)

When a function has been called, a stack frame will create for it, when a function exited, stack top pointer will go back to where before frame.

Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
2.1.2.1.1.5.3 Memory Management

When memory allocating, memory allocator will not only allocate memory you request, but also some extra memory for meta data.

text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to
text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to

Some times memory manager may use some free space for storing free space block meta data.

Allocate strategy:

  • Best fit
  • Worst fit
  • First fit
  • Continuous search

Some times memory allocator may return more space you need, but you can only rely on space you request.

Compact:

2.1.2.1.1.6 Section IX: Computer architecture

If have code:

c
int i;
int j;

i = 10;
j = i + 7;
j ++;
c
int i;
int j;

i = 10;
j = i + 7;
j ++;

Assuming memory segment:

text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+
text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+

Assume i, j are packed together within stack. BP storing stack base address.

To visit variable i i , using [SP+4] [SP+4] . Thus, i = 10; i = 10; could be written as mov [sp+4], 10 mov [sp+4], 10

For j = i + 7 j = i + 7 , it should first load i i and then do ALU operation.

  • load i i : mov r1, [sp+4] mov r1, [sp+4]
  • add: add r2, 7 add r2, 7

Then, mov [sp], r2 mov [sp], r2 . And, inc [sp] inc [sp]

2.1.2.1.1.6.1 Load / Store, ALU Operations
2.1.2.1.1.6.2 force conversion

Force conversion just cheat compiler rather than assembler. Assembler knows only address.

2.1.2.1.1.7 activate record: function call frame

If have: prototype:

void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}
void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}

The argument of corresponding parameter and the local variables are placed in almost close place.

4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why
4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why

When calling within other functions: like main main :

int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}
int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}

We may have:

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp

at initial.

Then, allocate space for variable i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp

Assign for i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp

When calling foo foo : pushing argument to stack for foo foo :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
2.1.2.1.1.8 Section XI: Swap, call in assembly
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}

In assembly, _cdecl _cdecl , arguments are pushed in reverse order:

_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp
_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp

While swap swap may written as:

void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}
void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}

8 bytes are reserved for saved pc saved pc and 16 bytes for 2 arguments. a a for rsp - 8 rsp - 8 , b b for rsp - 16 rsp - 16 since the program runs in x86_64 machine. Left most parameter lays at the button of stack frame.

In c:

void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}
void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}

swap swap function does not implemented as code shown in c, but use xchg xchg .

2.1.2.1.1.9 Pre-process, Compile, Assemble, Link

Code -> Processed Code -> Assembled Code -> Objected File -> Executable File

2.1.2.1.1.9.1 Preprocessor
2.1.2.1.1.9.1.1 #define #define

Replacement of text appear in source file.

  1. constant replacement

    #define SIZE 1024
    char buf[SIZE];
    #define SIZE 1024
    char buf[SIZE];
  2. parameterized macro

    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
2.1.2.1.1.9.1.2 #include #include
2.1.2.1.1.9.2 compiler
2.1.2.1.1.10 Section XIII:

What if comment #include <stdio.h> #include <stdio.h> ?

The program can probably still be compiled.

What if comment #include <stdlib.h> #include <stdlib.h> ?

assert assert will be seen as a function and the final object file will miss the symbol.

void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}
void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}

Will loop, forever.

What will happen if

int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}
int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}

Two function have same memory structure so that the Print can work correctly, since the function Declare Declare will not clean whole bit pattern after returning.

The technology is called "Channeling".

2.1.2.1.1.10.1 multiple arguments

Push arguments from right to left. For better organization of compiler.

2.1.2.1.1.11 Multiple Threads

Operating systems give different process a virtual memory. So that the program can assuming it holds all memory.

Kernel trace and maintaining Virtual Memory Mapping Table and calls MMU to map virtual memory of each process to real memory.

Program execution is sequential.

When multiple processes share one shared data, it may manipulate the data after other process manipulate it already. E.g., read a variable and check it already fit the requirement, when it about to do operation on it, it was switched to another process by scheduler, and the other process do operation on the variable successfully. When the time scheduler dispatch back to original one, it will never able to validate the variable and do same operation to the variable. Which cause the error.

The condition happened here called race condition.

There always be some critical section in code, when code executing in critical section, it will never able to validate the shared data again.

The solution is to use semaphore or lock to protect critical section. When a process want to enter critical section, it will try to acquire the lock.

Semaphore is a integer variable with atomic operation ability, when it is 0, the process can not enter critical section, else if it is greater than 0, the process can enter critical section and decrease the semaphore by 1 atomically. When leaving critical section, add the semaphore, release the resource.

Semaphore operations acquire resources.

2.1.2.1.1.11.1 Producer Consumer Problem

Producer generates data, puts into a buffer. Consumer takes data from buffer, process it.

Consumer should not take data when buffer is empty. Producer should not put data when buffer is full.

Use two semaphores to track the number of empty slots and full slots in buffer.

2.1.2.1.1.11.2 Reader Writer Problem

Reader Writer problem is a classic synchronization problem. With two types of processes, readers and writers, readers can read shared data simultaneously, writers need exclusive access to shared data.

2.1.2.1.1.11.3 Philosophers Dining Problem

Every philosopher needs two forks to eat. Five philosophers sitting around a table, when a philosopher wants to eat, it will try to pick up the left and right forks. But if all philosophers pick up the left fork first, then they will never able to pick up the right fork,

This is a deadlock.

2.1.2.1.1.11.4 Ice cream Shop Problem
2.1.2.1.1.12 Functional Programming Paradigm

In functional programming paradigm, each function are treated as regular mathematical function. Which accepts some input and produce some output.

;car
;cdr
;car
;cdr

car car in scheme extracts the first element of a list. While cdr cdr extracts the rest of the list.

Known already, so for short, Mujiu will not explain more about scheme here.

In scheme, or in lisp, car and cdr comes from lisp machine assembly instruction. There are two registers, address register and data register, which is the ar ar and dr dr where car car and cdr cdr comes from.

2.1.2.1.2  From The C Programming Language To Theoretical Computer Science (Section I) [S1]
2.1.2.1.2.1 Section I: C Programming Language

To have a glance to computer science, we must have known a programming language, and then it could lead you to understand some key concept within the computer and programming language design.

2.1.2.1.2.2 Intro

C语言, 历史悠长, 自从它于80年代伴随 Unix 出现, 便成为了全世界开发者的心头好. 至今为止都依然被广泛使用. 上到各种琳琅满目的应用程序, 下到操作系统内核, 都可以由C编写, 都依赖C的代码.

举个例子: 世界上的绝大多数服务器, 都是由 Linux Linux 承载着的, 而 Linux Linux 的内核, 几乎只有 C C 所编写的代码. 当然, 在大家的手机上, 任何一部安卓手机, 它的内核, 其实也是Linux, 可以说, C 驱动着世界上绝大多数设备的运行. (之所以不用Windows举例, 一是Windows是一个闭源产品, 二是Windows内核主要由微软自己魔改的C++代码编写)

C是一门高级语言, 但是何为高级语言?

2.1.2.1.2.3 High Level Language

高级语言是相对于低级语言而言的. 一般而言, 我们所说的低级语言, 是各个不同设备上面的汇编语言, 这些语言非常强大, 可以操作 CPU, 也非常基础, 一旦没有它们, 任何后续的工作都无法进行.

但是它们的问题也非常严重. 那就是它们与平台极度绑定, 一段代码, 只能在特定平台上工作. 即便逻辑相似, 或者完全一致, 但是你还是不得不按照不同平台的规定, 为它们依次适配. 这仅仅只是开发过程, 就已经可以体会到通过低级语言开发程序的麻烦了. 而到了软件升级这一步骤, 这样的一套流程就更加恐怖, 复杂度直线上升.

而高级语言, 是一种对于低级语言共同特征的抽象, 帮助程序员写出可以在不同平台间无痛或相对轻松移植的代码.

低级语言, 就像是专门为特定的设备编写的特制工具, 只能在某台设备上面使用. 它们虽然可以直接操作硬件设备, 但是写起来非常复杂. 而高级语言, 比如C或者Python, 可以让程序员使用更加容易理解的方式写出程序. 系统可以帮你, 将你的代码, "翻译" 成为机器可以理解的指令, 这样即便不担心底层的细节, 也能让程序在不同的设备上运行.

当通过C编程语言进行工作的时候, 我们可以抽象出加减乘除等操作, 分别对应操作不同位数数据的汇编指令; 可以抽象出各种变量, 直接对应内存中的一段空间.

比如: 如果只是以两数相加举例的话, 对于C而言, 无论哪个平台的加法都可以通过 a + b a + b 来完成, 但是对于 IBM IBM 兼容机型的 x86_64 x86_64 架构 intel intel 语法宏汇编 (好长的定语) 而言, 则可能是 ADD AH, BH ADD AH, BH , ADD AX, BX ADD AX, BX , ADD EAX, EBX ADD EAX, EBX , 乃至于 ADD RAX, RBX ADD RAX, RBX 这里甚至只是考虑到只有两个通用寄存器参与运算的情况, 如果还有内存, 还要复杂的多. (其实如果用 AT&T AT&T 语法还能更复杂些, 毕竟 AT&T AT&T 还要考虑指令名的问题).

这就为程序的移植提供了极大的方便, 不再需要手动为不同的平台进行适配.

2.1.2.1.2.3.1 Mid-Level Language

C语言虽然名义上是一个高级语言, 但是很多人并不这么认为, 因为C语言并不提供一种通用的内存管理方案. 所有的内存都需要由程序员自己来手动管理. 这为系统编程提供了便利, 但也造成了不少内存泄漏等问题. 依旧需要考虑与低级语言汇编相似的边界问题.

因此, 便有人将C语言称作中级语言, 过渡语言. 不过, 这不过是称呼上的差别而已.

2.1.2.1.2.3.2 Compile & Interpret

CPU 实际上只能够理解和运行二进制的机器码. 因此, 直接以人类可读形式写出来的代码, 计算机没有办法直接执行. 这就需要对代码进行 编译 编译 , 或者 解释 解释 .

源代码 编译 汇编文件 汇编 目标二进制 链接 目标可执行
  1. 编译, 是将代码编译到汇编语言 (或其他语言), 再通过汇编器生成对应二进制代码, 最后链接, 产生原生可执行程序 (该可执行程序会最终包含操作系统需要的结构) 的一种过程.
源代码 解释器 输出
  1. 解释, 则是不经过编译过程, 通过虚拟机, 或者解释器, 随读入源文件执行代码的过程.

实际上, 对于现代语言, 编译型语言和解释型语言的区别并没有特别大. 比如, Java Java 语言就既需要编译到 JVM bytecode JVM bytecode , 也需要用 JVM JVM 解释字节码运行.

而我们, 会因为一门语言更倾向于如何运行, 来说这个语言是编译型语言, 或解释型语言. 比如, C语言, 就是一门会要求编译, 再运行的语言, 因此, 我们认为, C语言, 是一门编译型语言. 再如, 大家或许熟悉的 Python语言, 便是通过解释器执行的, 因此才认为 python语言 是一门编译型语言.

2.1.2.1.2.4 Environment And IDE

不知道大家是否喜欢玩 PC 上的游戏, 有时候玩游戏会提示缺少 DirectX DirectX 运行时环境, 编程也和玩游戏一样, 是需要环境的. 一般而言, 我们将这种专门用于开发程序的环境, 称作开发环境. 而将所有开发所需要的工具和开发环境本身, 一起打包, 并预先配置的软件系统, 就称作集成式开发环境(IDE).

在 Windows 平台上, 最常用的C语言 IDE 是 Microsoft (C) Visual Studio, 不过这个 IDE 以及它配套的编程环境, 都是为了 C++ 和 C# 而量身设计的, 并不太适用于 C 语言, 而它强制要求的工程管理, 以及提供的过多功能, 也容易导致初学者眼花缭乱, 忽视C语言学习的核心.

而 MacOS 平台上, 苹果公司提供了 Xcode IDE, 不过除了不得不写 Swift, 也几乎没有人使用它.

Linux 平台, 最常用的 "IDE" 是 (Neo)Vim 和 Emacs, 不过, 并不适合所有人使用.

鉴于平台相对不易统一, 而以上三个平台, 均提供了相对简单的方式以 LLVM-Clang LLVM-Clang 编译器作为 C语言 的编程环境, 在此处, 我们将采用手动配置环境的方式, 来作为学习C语言的第一步. 这也是大多数教程, 机构, 学校, 并不会教授, 而对于后续编程学习至关重要的一个部分.

另两个个人认为相对重要的部分是工具的使用和工具与知识的区别, 分别可以在 "计算机教育中缺失的一课 (The Missing Semester of Your Computer Science Education)" 和 "理论计算机导论 (Introduction to Theoretical Computer Science)" 中找到.

2.1.2.1.2.4.1 Environment Variables

环境变量可以被视为程序的设置, 它们告诉程序该如何工作, 比如, 配置 "PATH" 可以帮助程序找到需要的文件或者指令.

简单的理解, 对于程序而言, 这就是字典的索引, 当我试图索引一些信息的时候, 可以先去目录找到 "键", 然后根据 "键" 取得 "值".

而这些组合, 可以控制程序的行动. 目前需要了解, 并且对于今后都非常重要的一些环境变量分别是:

  • PATH PATH : PATH 变量就像是指示牌, 告诉了系统到哪些地方找到你输入的指令
  • 例如: 当你希望去通过 gcc 来编译程序的时候, 系统就会到 path 指定的文件夹中, 查找 gcc 程序. 如果没有办法找到, 就会报错.
  • 当我们在控制台(命令行) 输入一些指令, 并试图执行它们的时候, 操作系统就会通过 Path 环境变量搜索, 如果可以找到, 就执行对应找到的指令, 如果没有, 则会报错.
  • 当然, 不只是我们自己执行指令的时候需要用到Path, 很多其他的程序也会通过 PATH 来找到它需要的程序. 比如动态链接器 ( ld-linux-x86_64.so ld-linux-x86_64.so )
  • 好吧其实目前只用知道 PATH 一个就够了 (
2.1.2.1.2.4.2 Windows

对于 Windows 而言, 环境变量的修改非常便捷安全:

打开 文件资源管理器 (Explorer), 右键点选 "此电脑", 并在弹出菜单中选择 "属性" - "高级系统设置" - "高级" - "环境变量" 即可看见环境变量的配置窗口.

如果需要编辑任何之一, 只需要双击点选项目, 就可以看见对应修改界面了.

那么, 如果需要手动安装C语言的开发环境, 就需要先下载对应编译器, 然后将编译器本身所在的路径通过以上的方式加入PATH环境变量中. 不过, 相对于其他方式来说, 这种方式不仅不方便, 当需要更新开发环境的时候, 也会非常麻烦.

当然, windows也有更简单的方法去安装 C语言 的编程环境, 那就是通过 WSL.

WSL的全称是 "Windows Subsystem for Linux", 是微软创造出来, 用于提升开发者体验的一个工具. 凭借WSL, 我们可以非常容易的, 像直接使用Linux一样的安装和管理开发环境.

2.1.2.1.2.4.3 Linux, MacOS & *nix

对于类Unix及Unix系统而言, 环境变量的修改往往和用户配置文件相关联. 不过, 实际上, 要在这类系统上安装 C 的编程环境, 完全不需要对环境变量做过多修改, 而可以简单通过几行命令完成.

2.1.2.1.2.5 Hello, World

于是便到了我们的第一个程序: Hello, World!

这是一个来自于 C程序设计语言 (the C Programming Language) 中的例子, 同时, 它也陪伴了一代又一代新生的程序员. 带着我们对自己创造的新世界的欢呼.

"Hello World" 是程序设计中的经典入门例子. 它简单的向屏幕输出一句话, 帮助你了解代码的基本结构和运行流程. 学会了如何编写和运行 "Hello World", 你就可以开始学习更加复杂的程序啦.

#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}
#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}

大家可以用任何笔记本将这段代码写下, 将它保存 (不要放桌面) 为 hello.c hello.c .

然后, 我们就可以开始进行编译了:

  1. Open a terminal,
  2. Enter dir dir : cd ${pwd} cd ${pwd} , where ${pwd} ${pwd} is the directory your file placed in,
  3. check if there exists file hello.c hello.c , type cat hello.c cat hello.c and press enter enter . Just after the command has been inserted, the content of whole file will be displayed. If the content printed in screen does not match the contents showing in your text input area, then you have not save the file properly. For example, the command will response with:

    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }
    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }

    in my computer with my code shown above.

  4. 最后, 输入 clang hello.c -o hello clang hello.c -o hello , and it will give no information if there are no syntax error or other problems.

然后我们就会获得一个名为hello的文件 ( hello hello 是文件名, .exe .exe 叫拓展名). (you may find it at the file explorer). 这就是我们的目标可执行文件了!

Finally, 大家可以在终端中输入 ./hello ./hello 来执行它. 这样, 就可以看到它执行以后的结果啦:

Hello, World!
Hello, World!

这样, 你就完成了c程序的基本组成, 下面, 我们将依次简单的介绍, 它们都代表了什么含义. 这样, 你就可以自己尝试, 修改这个程序的内容, 写出独属于自己的 "Hello World".

Try to change the source code and you may let it print your name.

2.1.2.1.2.5.1 Explanation

Looks fantastic?

Here let us explain the structure of our current program.

The c program always composed in similar order. For example, we always have the three parts – header file import, entry, and expression.

我们的 "Hello, World" 程序, 包含了几个部分, 库文件的引入, 入口函数(main), 以及主要的表达式.

2.1.2.1.2.5.2 Library

C语言的内核很小, 只包括了一些非常基础的功能, 而其他的部分则都通过库来提供. 同时又因为它相对比较简陋, 所以当我们使用它的库的时候需要一个描述文件, 这个文件就可以告诉编译器, 这个库提供了哪些功能.

比如说, 这段程序, 首先是一串以 '#' 号开头的文本, 这句话表示, 我们引入了一个名叫stdio的库的定义.

'#' 号, 实际上代表了 "预处理指令" 的开始, 这里的预处理指令就是 "include". Include指令常常被用来包含一个文件, 比如说这里, 就包含了 stdio.h 这个文件.

Stdio, 是 "Standard Input / Output" 的简称, 它定义了常用的输入和输出函数, 它也将会成为后续C语言程序设计中最常用的库.

那么include指令是怎么样确定它需要包含哪些文件的呢? 实际上这取决于他需要包含的文件通过什么包裹. 比如在这里, 我们就使用尖括号 ('<' 和 '>') 包裹了 stdio.h, 它表示编译器会从系统路径中查找, 如果找到这个文件, 就将这个文件完整展开在指令处. 而如果我们通过双引号 ('“') 包裹了 stdio.h, 编译器就会先尝试从当前目录查找文件了.

大家可以尝试, 在 hello.c hello.c 同目录, 创建一个 stdio.h stdio.h 文件, 再重新编译一下这个程序, 看看是否会有区别.

如果将尖括号改成双引号呢? 比如我们下面会说到的 printf printf "函数", 就是由stdio.h文件告知编译器的.

那么什么是函数呢… 先卖个关子, 后面会对函数有详细的解释.

下面就是我们程序的主体了.

2.1.2.1.2.5.3 main
int main(void) {
  // ...
}
int main(void) {
  // ...
}

这部分, 就是我们的程序开始执行的部分. 如果没有它, 我们的程序就没有办法执行.

大家可以试一试, 如果不写这些部分, 只写下中间的 printf("Hello, World!\n"); printf("Hello, World!\n"); 会出现什么情况? 当然, 当我们按下运行按钮的时候, 它会告知, 这段程序并不 "合法". 当然, 这不是在说我们做了违法的事情, 而是这样的程序, 不合C语言的语法.

同时, 如果看到 Visual Studio Code 底部的 "PROBLES" 面板, 也可以看到, 它告知我们, 这个文件, 有许多的问题. 我们将它告知的信息称之为, 错误信息, 或报错.

我们将这个部分称作 "主函数定义". 而这个main, 就是主函数了.

它基本可以被认为是固定格式 (固定格式一共有四种, 托管环境三种, 非托管环境一种, 但是目前只需要会这一种即可).

printf("Hello, World");
printf("Hello, World");

则是我们程序唯一的主体 — 我们的程序实际上只干了这一件事 — 输出 "Hello, World".

2.1.2.1.2.5.4 Function

刚才的两个部分, 我们都提到了一个概念 – "函数", 函数是什么呢, 函数实际上是一系列代码, 一系列功能的集合, 通过定义函数, 我们可以将一些不同的操作组合在一起. 方便了程序的开发. 同样的, 也可以把这样的函数提供给自己, 或者其他人使用.

比如我们用到的 printf printf 函数, 也比如我们定义的main函数.

和数学里的函数类似, 函数可以接受一些参数, 并且产生一些输出. 就像多元微积分里的向量函数,

𝑓(𝑥,𝑦,𝑧):3

就可以接受x,y,z这样的参数, 并且将它们经过一系列的变换, 让它们变成一个普通的一维值.

这里的 printf printf 和它之后的圆括号的组合, 我们将其称作函数调用. 其实也和数学中的函数, 含义一致.

Printf(...) Printf(...) 的作用是, 将文本按照一定格式打印到屏幕上, "Print (with) format", 就是这个意思啦.

而这里的 "Hello, World" "Hello, World" 就是函数调用的参数, 它告诉 printf printf 函数, 要将什么东西给输出到屏幕.

不过这里只是简单介绍它的作用哦, 实际上 printf printf 函数的作用远不止这样简单的! 我们后续会有章节单独介绍它的功能.

return 0;
return 0;

这一句, 用于终止这个函数: "main". 当编译器看见这一句话, 就知道要结束这个函数的执行了… "返回".

这其实也涉及到了一些后面的知识, 所以目前记住主函数的结束, 必须写上这样一句 return 0; return 0; 就可以了.

2.1.2.1.2.5.5 Expression: Statement.

大家如果仔细观察了, 就会发现, main函数内部的两个东西, 结尾都是分号.

其实, 分号 (';'), 表示一个语句的结尾. What is statement, statements are base unit of c programming language. Every c program are make up with statements For example, our simplest program is:

int main(){}
int main(){}

here, it contains just a function definition statement. But after all, every c program must have at least one statement.

Statements are colourful, but, the rule for them are relative same. 除了一些特殊情况, C语言中写下的所有代码, 结尾都是有分号的.

语句大致可以被分为五种:

  1. 表达式语句
  2. 函数调用
  3. 流程控制语句
  4. 复合表达式
  5. 空语句

将会在后面详细讲解各个语句, 不过, 一定要记住, 每个语句的结尾都需要一个分号;

2.1.2.1.2.6 Types

C 语言是一门静态类型语言. 那么, 这一句话就涉及到两个新知识点了!

  • 什么是类型,
  • 什么是静态类型?

作为一门计算机语言, C语言操作的实际上都是一些数值. 对于不同的数值, 我们会人为规定它是什么 "类型".

比如, 我们就将大小在 2147483648(231)2147483647(2311) 之间的整数视为 "整型数 (Integer)". 而同时, 我们也需要表示一些文本, 所以就有了所谓的 "字符(Character)" 类型和 "字符串([Character] String)" 类型.

不过为什么需要将不同类型区别开来呢? 很明显, 字符串是没有办法当作整数来处理的对吧! (除非你把它们当作范畴论范围上面的幺半群来看… 当然这样也只能统一操作而没有办法让字符串和数字相加哦~)

那么静态类型是什么呢?

就像数学并不完全是数字的操作, 大部分时候也和未知数相关一样, 计算机程序也有自己的 "未知数" 需要操作. 当我们需要计算一些东西的时候, 很多时候都需要一个叫做 "变量" 的东西存储中间结果. 这个 "变量" 既然需要存储数据, 那么它就也需要一个类型. 毕竟, 不同类型的数据, 就上上面刚刚说明的, 有着不同的属性, 完全没有办法用同样的方式存储.

而 C语言 更进一步, 为了避免变量在多次赋值以后, 类型会不清, 干脆让我们在定义变量的时候就固定它可以承载的数据类型了. (实际原因当然不是这样啦, 实际上 C语言 必须有类型的信息, 才能为变量分配空间, 而不同的类型一般而言需要的空间不同, 自然不可以混用, 后续将在 "内存模型" 部分详细解说喵~ >w<) 这就是我们说的 "静态类型" 系统.

2.1.2.1.2.6.1 Literal

字面量, 就像我们在解数学题目的时候, 会写下一些系数, 一些常量, 字面量就是直接出现在程序当中的常量.

不过和常量有一些区别的是, 字面量是真正没有办法被改变的. 而计算机程序中的常量, 则仅仅只是表示一个变量不会被改变而已… 通过一些特殊的手段, 我们也是可以让一个常量打开心扉, 接受新的数值的.

2.1.2.1.2.6.2 Basic Data Types

对于简单的编程任务, C语言定义了一些基本数据类型. 它们涵盖了数字, 文本和逻辑(好吧其实并没有).

2.1.2.1.2.6.2.1 Integer

我们最常用, 并且也将最先介绍的就是整数家族了:

  • short short : 短整型, 相对于整型, 需要的内存更少, 只有16位空间 但是相应的,可以表示的数值也越少.
  • int int : 整型, C语言中默认的数据类型, 一般为32位空间, 也就是可以有31位二进制可以用于表示数据, 上述的 21474836482147483647 便是它可以表示数据的范围
  • long long : 长整型, 相对于 int int , 可能更长, 一般在处理大数据的时候才会用到
  • long long long long : 真长整型, 确定的64位数据.

每当我们在代码里面写下一个整数, 它就会自然具有上述类型之一的信息. 比如:

short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;
short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;

注: 以上代码均写于 主函数 当中!

这里, 0, 65536, 2147483647 就都是 "int" 类型的 "字面量", 而 2147483648 就是一个 "long long" 类型的字面量了.

不过这些数字前面的类型和等于号都有些什么作用呢… 大家马上也会明白! 不过我们先来了解一下整数的变体们:

  • signed signed : 有符号前缀, 表示该类型是一个有符号的数据, 一般而言, 整型都是有符号的
  • unsigned unsigned : 有了上一条的提示, 当我们不需要表示数据的负数部分时, 当然就可以用无符号类型了, 当我们用无符号来修饰一个变量的时候, 它的表示范围就会从一半正一半负, 变成完全的正数哦, 相当于给 加上了一个的上标, 变成了, 不仅如此, 它正数部分的表示范围也会翻倍
  • 不过虽然被称作前缀, 它们其实也是可以 "单干" 的, 当只有前缀出现时, 实际上 C语言 (标准) 会自动给他补上一个 int 的.

这里可以再来几个例子:

signed int i = 2147483647;
unsigned int u = 2147482647u;
signed int i = 2147483647;
unsigned int u = 2147482647u;

Integer may be expressed as:

<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
2.1.2.1.2.6.2.2 Literal Suffix

有些同学可能就注意到了, 我们有些的数字之后, 跟上了一些字符. 这些字符, 比如 ll ll , ull ull , 被称作字面量后缀, 它的作用是, 给字面量一些修饰, 以方便编译器正确的处理这些数值.

那么, 大家注意到:

long long ll = 2147483648ll;
long long ll = 2147483648ll;

这一行, 大家可以尝试将这一段文本的字面量后缀 ll ll 去掉, 看一下, 会发生什么? 当我们尝试运行程序的时候, 程序报错了.

这是因为, 在C语言中, 我们写下的所有整数, 默认的类型都是int类型, 如果字面量超出了int类型的范围, 那就会出现错误.

2.1.2.1.2.6.2.3 Real numbers: float float & double double

在整数之外, 我们自然还有小数. 在 C语言 中, 我们将小数称之为 "二进制浮点数" 简称 "浮点数".

C语言中的常用浮点数一共有三种, 分别是:

  • float float : 默认浮点数, 一共占用32位字长, 不过相对于整数, 浮点数并没有精确的表示范围
  • double double : 双精度浮点数, 相对于 float float , 它的表示精度更高
  • long double long double : 双精度的升级版

不过为什么浮点数要叫做浮点数呢? 当然是因为它的小数点不是固定的啦.

不过, 也许还有人会疑惑, 什么叫做固定的小数点? 一般而言, 小数的位数不是无限的吗? 这当然还是因为计算机表示的局限性.

比如, 当我们需要表示金额的时候, 一般都可以写作 "XX元Y角Z分" 对不对, 那么当我们想要统一在 "元" 表示的时候, 就可以写作 "XX.YZ元" 了. 那么这里, 我们相当于是将所有单位统一到 "元", 而给 "角" 和 "分" 固定在了小数点后两位. 这就是所谓的 "定点数". 或者说, "100倍放缩的定点数".

那么, 有了 "定点数" 的前置理解, "浮点数" 或者 "动点数" (这是我瞎起的) 就好理解了. 因为定点数太过于固定, 只能适用于某些特殊场景. 所以就可以想到, 如果我们用一些方式, 记录住小数点的位置, 不就可以来表示任意形式的小数了吗. 于是, 浮点数就诞生了. 不过, 上面我们表示的 "定点数", 是以 10 为基底的十进制定点数, 而在计算机里, 我们使用二进制数来表示数据, 因此, 我们实际上使用的浮点数也是二进制表示的. 这就可以解释什么叫做 "二进制浮点数" 了.

2.1.2.1.2.6.2.4 Type Boost

当然, 在数学之中, 我们也有整数和小数的运算, 大家可以先试一下, 当我们在c语言之中, 进行了可以得到小数的运算之后, 会得到怎么样的结果?

printf("%d", 1 / 2);
printf("%d", 1 / 2);

结果是0, 是不是很奇怪?

因为, 在c语言中, 整数和整数之间的运算, 只会得到整数, 如果需要一个浮点数结果, 就必须让一个浮点数参与运算, 比如

printf("%f", 1 / 2.0);
printf("%f", 1 / 2.0);

这样, 就得到了0.5.

为什么会这样呢? 因为在 C语言中, 当一个运算涉及的类型不相同的时候, 会将表达范围较小的数据, 转换成为表达范围更大的一个数据, 再去参与运算. 我们将这种过程称作, 自动类型转换.

当这里的int类型的整数, 遇见了2.0这样一个float类型的浮点数, 实际上浮点数的表示范围大于整数, 所以, int就被提升到了float类型, 并且参与运算, 得到 1.0 / 2.0 = 0.5 了.

以下是自动类型转换的图表

small -------------------------------------------------------> -------------------------------------------------------> large
char, short, int unsigned int long long long float double long double

从左到右, 类型依次自动提升.

而从整数开始的类型转换, 被称作 "整型提升". 比如可以看到, char, short, int类型, 均为同样的自动类型转换阶段. 因为对于char, short, 和int类型, 都发生了相同了整型提升, 按照C语言的规则, 会将所有的表示范围小于int的类型, 均提升到int类型的大小来参与运算.

无论使用什么整数, 都可以在表达式中使用char, short int或 int字段(全部带符号或没有符号)或枚举类型的对象. 如果一个int可以代表原始类型的所有值, 则该值将转换为int; 否则, 该值将转换为unsigned int, 这个过程称为整体提升.

这从汇编的角度来看, 其实就是将寄存器由小寄存器, 拼接到相对大的寄存器. 如, 将 AH AH 寄存器, 提升到 EAX EAX 寄存器.

2.1.2.1.2.6.2.5 String & Char

另一部分, 在数值之外, 就是字符类型和字符串了.

我们在数学的学习中, 计算出的结果, 直接写在 "解" 字后面就可以, 这实际是一种得出结果的 "输出" 过程. 那么, 同为进行数学计算的计算机, 要如何组织它的输出呢? 当然就是靠字符串咯:

printf("This Is A String");
printf("This Is A String");

依旧是熟悉的 printf printf , 不同的是它需要操作的字符串.

字符串, 顾名思义, 是一串连续的字符序列, 一般我们用双引号括住的一串连续文本来表示一个字符串字面量.

那么字符该怎么样表示呢?

很简单, 除了双引号, 我们还有单引号呀. 理想情况下, 所有的单引号包括的单个字符都是一个字符. 不过, 因为有些字符完全没有办法用键盘打出来, 所以我们也提供了另外一些方式:

  • 'c' 'c' : 单引号包括字符
  • '\ooo' '\ooo' : 按8进制表示的字符
  • '\xhhh' '\xhhh' : 按16进制表示的字符

当然咯, 有些字符远超过了字符可以表示的长度(8位), 所以我们还有另一种字符类型: "长字符" 类型.

  • L'c' L'c' : 单引号包括的长字符
  • L'\ooo' L'\ooo' : 单引号包括的8进制表示长字符
  • L'\xhhhh' L'\xhhhh' : 单引号包括的16进制长字符

大家其实也可以看出来, 长字符字面量实际上就是给普通的字符字面量添加了一个"L"前缀罢了. 那么实际上, 我们也可以用同样的方式, 把一个普通的字符串字面量变成长字符串:

wprintf(L"Hello World");
wprintf(L"Hello World");

注: 实际上中文字符都会超过字符类型可以表示的范围, 但是为什么普通字符串可以表示含有中文的文本呢? 比如, printf("你好, 世界"); printf("你好, 世界"); . 因为字符串实际上不一定是一个字符变量表示一个字符, 现在看来可能会有些绕口, 但是当我们讲到字符串实际的表示方式的时候, 就会很好理解了.

所以也不是特别需要用长字符串来表示文本了.

对了, 不知道大家有没有注意到, 当我们描述整数类型的时候, 并没有说到8位整数, 对应着其他语言中很常见的 byte byte 类型? 这是因为, c语言用 char char 类型代替了8位整数, 所幸, c语言中并不是很常用到8位的数值, 因此这样的代替也并不是很大的问题. 当我们真的需要它的时候, 也可以临时用 char char 类型充当一下.

2.1.2.1.2.6.3 Logical Values

当然, 计算机也不总是只处理数值. 作为一堆二三极管, 逻辑门, 晶体管拼接而成的产物, 有有着天生的二进制表示, 二进制逻辑也是计算机程序处理的内容之一.

先从简单的入手, 逻辑一共有两种状态, 是, 或者否, 在 C语言 中, 我们用了一种很简单的方式来表示:

  • 数值为0: 否 ( false false ),
  • 否则: 是 ( true true ).

很简单对不对.

2.1.2.1.2.6.4 Void Type

以上的类型, 都还很具体, 不过当我们需要表示 "这里没有东西" 呢? 该怎么办?

这时候我们就需要用到 void void 类型了. 不过这里不解释太多, 我们将会在应用中见证它的使用.

2.1.2.1.2.7 Mathematics Operations

有了数字, 并不能让我们进行计算, 我们还需要定义对于这些数字的运算才可以.

所以首先, 对于所有的数值, 不管是整型数家族的, 还是浮点数家族的, 都适用于我们熟悉的四则运算, + + , - - , * * , '/'.

Operations Description Form Comment
+ + 两数相加, 并返回新的相加后的值 A + B A + B
- - 从前数中减去后数, 并返回新的相减后的值 A - B A - B
* * 两数相乘, 并返回新的乘积 A * B A * B
/ / 前数除以后数, 并返回除商 A / B A / B

当然了, 由于取余数的操作太有用了, 实际上 C语言 也为整数和浮点数的取余操作定义了两个方式, 并将这种运算称作 "取模":

Operations Description Form Comment
% % 取模 A % B A % B
fmod fmod 浮点数取模 fmod(A, B) fmod(A, B) 该方法为函数调用, 仅对 double double 类型浮点数生效
fmodf fmodf 浮点数取模 fmodf(A, B) fmodf(A, B) 该方法为函数调用, 对 float float 类型浮点数生效
fmodl fmodl 浮点数取模 fmodl(A, B) fmodl(A, B) 该方法为函数调用, 对 long double long double 类型浮点数生效

下面则是c语言中, 整型变量特有的四种运算符, 它们被称作 "自增/自减运算符"

Operations Description Form Comment
++ ++ 自增 A++ A++ 先将原始值返回, 再将变量值增加1
++ ++ 自增 ++A ++A 先将变量值增加1, 再返回增加后的值
-- -- 自减 A-- A-- 先将原始值返回, 再将变量的值减少1
-- -- 自减 --A --A 先将变量的值减少1, 再返回减少后的值

大家可以发现, 自增和自减运算符都是有一定的规律的, 如果运算符的位置在变量的前面, 那么就是先对变量进行操作, 然后再取值, 而如果运算符的位置在变量的后面, 则先取值, 等到值参与完运算以后再给变量自增或自减.

int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);
int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);

同样的, 大家也可以看到, 这里对于运算符的描述并不是对数值生效了, 而是对 "变量" 生效. 那么变量是什么东西呢? 正如之前已经提到过的, 变量是一种用来存储数值的东西, 那么既然变量可以存储数值, 并且也可以参与运算, 所以我们就也自然会有一些对于变量本身存储的数值进行操作的运算符, 除了这里讲到的自增自减运算符, 其实还有其他的, 比如赋值运算符.

2.1.2.1.2.7.1 Relation Operations

除了数值运算, 实际上我们也可以对这些数值进行比较, 在 C语言中, 这些用来比较不同数值之间大小关系的运算符, 被称作 "关系运算符".

关系运算符对于所有的数值都生效, 而对于字符串, 由于字符串的比较也非常常用, 因此, 字符串比较的函数也是被纳入到了标准函数库中. 不知道大家是否还记得前面提到的, 什么是 "库". 库, 就是一种由其他人写出来, 而不是由C语言本身提供, 定义了一系列有用的函数以供导入的东西.

好吧, 扯远了, 一下就是所有常用的关系运算符 (和函数):

Operations Description Form Comment
== == 相等关系 A==B A==B 若A等于B, 则返回1
!= != 不等关系 A!=B A!=B 若A不等于B, 则返回1
> > 大于关系 A>B A>B 若A大于B, 则返回1
< < 小于关系 A<B A<B 若A小于B, 则返回1
>= >= 大于等于 A>=B A>=B 若A大于等于B, 则返回1
<= <= 小于等于 A<=B A<=B 若A小于等于B, 则返回1
strcmp strcmp 字符串比较 strcmp(A, B) strcmp(A, B) 若两字符串相等, 返回0, 否则返回按字典序相减值
memcmp memcmp 内存比较 memcmp(A, B) memcmp(A, B) 返回两内存空间相减二进制值

不过, 必须要注意的一点是, C语言中不存在连续不等式, 也就是说, C语言中是没有办法写出类似 𝐴>𝐵>𝐶 的这种表达式的.

那么, 如果真的不小心写出了这样的代码, 会发生什么事情呢? 比如说 1 < a < 10 1 < a < 10 .

实际上, 这种表达式会被C语言认为是一种连续运算的表达式. 也就是, 前面一个表达式运算完成, 然后再让结果参与下一个表达式的运算, 而这种连续运算, 是存在优先级关系的, 就像数学中, 同时包含加减和乘除的算式中, 永远都是乘除先参与运算一样.

那么, 对于上面的表达式, 就是先进行 1 < a 1 < a 的运算, 再把结果, 不论是1, 或是0, 交给后面与10的比较. 这样就会导致, 这个表达式的结果, 一定只是1.

因此, 一定要注意, 不要写出 "连续不等式" 哦.

2.1.2.1.2.7.2 Logical Operations

逻辑运算, 也是C语言经常需要进行的运算, 那么什么是逻辑运算呢?

实际上, 逻辑运算就是能够把多个逻辑值串成一串, 确定最后到底结果是真是假的运算.

就比如, 刚刚才提到的, C语言中并没有连续不等式, 那么该怎么样表示连续不等关系呢? 这里就需要用到逻辑运算了.

逻辑运算主要包含了, 或, 与, 非, 三种运算:

Operations Description Form Comment
&& && 逻辑与 A&&B A&&B 若A和B都非0, 则返回1
|| || 逻辑或 A||B A||B 若A和B有至少一个非0, 则返回1
! ! 逻辑非 !A !A 若为0, 则返回1; 若非0, 则返回0

从这里, 也可以看出来, 逻辑与或非和逻辑门运算还是非常不同的. 所以后面, 将会单独对按位逻辑运算进行详细介绍…

回到如何表示连续不等关系, 只要这样写即可

1 < a && a < 10
1 < a && a < 10

值得注意的是, 逻辑运算符, 都是 "短路" 的. 这是什么意思呢? 就是说, 如果逻辑运算符的左边结果, 已经可以决定逻辑运算符整体结果, 那么逻辑运算的右半部分就不会被执行, 而是直接将逻辑运算的结果返回出来.

2.1.2.1.2.7.3 Associativity

正如上面提到的, 运算符结合性决定了连续运算的表达式的执行顺序, 那么, 具体的规则如何呢?

在下表中, 自上而下, 与对应操作相关的表达式被更先进行, 由左而右, 结合性依次减小

Operations Description Comment
() [] -> . ++ -- () [] -> . ++ -- 后缀 从左到右
+ - ! ~ ++ - - (type)* & sizeof + - ! ~ ++ - - (type)* & sizeof 一元 从右到左
~ ~ 按位取反 从左到右
* / % * / % 乘除 从左到右
+ - + - 加减 从左到右
<< >> << >> 移位 从左到右
< > <= >= < > <= >= 比较关系 从左到右
== != == != 相等关系 从左到右
& & 按位与 从左到右
^ ^ 按位异或 从左到右
| | 按位或 从左到右
&& && 逻辑与 从左到右
|| || 逻辑或 从左到右
? : ? : 三目运算 从右到左
= += -= *= /= %= >>= <<= &= ^= |= = += -= *= /= %= >>= <<= &= ^= |= 赋值 从右到左
, , 逗号 从左到右

很复杂对不对, 但是没有关系, 其实, 当你不确定运算符优先级究竟是如何的, 可以直接将自己希望的运算顺序用括号括出来, 表示它们需要优先进行. 其他的部分, 也是非常符合数学中的直观感受的.

大家也许会发现, 除了我们已经讲过的一些基本数值运算, 这张表中还有一些从未见过的其他运算符,

仔细观察的话, 除了逻辑与和逻辑或, 在这张表中还有按位与或, 异或, 和取反. 很快, 我们将开始了解它们.

PS. 另一个比较重要的则是赋值运算符家族, 将在重新完整介绍完C语言的语法后介绍.

2.1.2.1.2.7.4 Binary Calculation

现在, 就需要一些简单的数学了: 二进制运算.

首先, 什么是二进制运算呢, 实际上, 二进制运算是针对二进制数的运算, 虽然这话听起来好像是废话, 但是它实际上 也是废话 却有很多含义.

首先, 它表示了它操作的对象是二进制数, 也就是运算规则为逢二进一的数.

二进制的基数为2, 每一位的数字, 只可能是0或1.

二进制数有一些特别的特性, 其中最显著的优势在于, 它的每一位只有两种状态, 这正好和电路的开关相一致. 这样就方便了计算机的工作. 另外一些特性是, 二进制数可以方便的和十六进制与八进制相互转换, 虽然这些实际上是十六进制和八进制的优势, 因为它们基数均为二的次方.

2.1.2.1.2.7.5 Radix Convert

二进制对于计算机友好, 但是对于人类来说却有些难办了. 因为我们常年都在和十进制打交道.

那么这就需要处理各种 "进制转换" 问题.

二进制和十进制, 同样都表示了同样的数集中的数, 因此它们可以以一定规则互相转换.

二进制转换为十进制, 实际上就是依照每一位, 乘以对应的二的次方. 也许听起来会有些复杂, 但是操作起来非常简单: 如: 我们有二进制数 1011, 那么它的十进制就是:

(1011)(2)=1×23+0×22+1×21+1×20=(11)(10)

二进制转换为十进制也是类似的, 就是不断将十进制数除二取余数即可:

112=5152=2122=1012=01

最后将余数从下向上写出即可得到对应二进制数.

上文提到, 二进制和十六进制, 八进制的互相转换非常方便, 那么, 它具体方便到什么程度呢? 对于二进制转十六进制, 只要按四位一组, 高位不足补0, 直接换成十六进制就行. 八进制也类似, 按三位一组, 高位不足补0, 替换成为八进制.

继续以 1011 举例:

(1011)(2)=(𝐵)(16),(1011)(2)=(001011)(2)=(13)(8).

反向操作也极其一致, 非常方便.

2.1.2.1.2.7.6 Bitwise Operations

二进制, 除了常规的十进制运算, 其实也提供了一些特别的运算能力, 在C语言中的表现就是, 按位运算.

在计算机中, 门电路一种可以提供 与门(AND), 或门(OR), 非门(NOT), 与非门(NAND), 或非门(NOR), 异或门(XOR), 同或门(XNOR), 这几种逻辑门.

它们的运算逻辑可以以下表表示:

Operations Description Form A B Result
AND AND A AND B A AND B 1010 1100 1000
OR OR A OR B A OR B 1010 1100 1110
XOR XOR 异或 A XOR B A XOR B 1010 1100 0110
NAND NAND 与非 A NAND B A NAND B 1010 1100 0111
NOR NOR 或非 A NOR B A NOR B 1010 1100 0001
XNOR XNOR 同或 A XNOR B A XNOR B 1010 1100 1001
NOT NOT NOT A NOT A 1010 - 0101

实际上, 它们的规则也非常简单:

  • 与门当且仅当两个输入均为1时才输出1, 否则输出0;
  • 或门只要有一个输入为1就输出1, 否则输出0;
  • 非门将输入取反, 原输入为1, 输出0, 否则输出1;
  • 与非门实际上是与门取反, 只在输入不存在, 或有一个1的时候才输出1, 否则0;
  • 或非门则是或门取反, 当均为0时才输出1, 否则输出0;
  • 异或门的重点在于 "异", 当两个输入相反时, 输出1, 否则输出0;
  • 同或则是异或取反, 当输入均相同时, 输出1, 否则输出0.

因此, 实际上, 一切包含非的门电路, 均可以来自于与, 或, 取反, 而其他所有门电路, 则均可以通过NAND门取得.

计算机底层的实现中, 有逻辑门运算, 而C语言中, 也有对应的按位运算. 按位运算是门运算对于多位二进制数的运算, 一共有四种:

Operations Description Form Comment
& & 按位与 A&B A&B 若A和B对应位都非0, 则对应位置1
| | 按位或 A|B A|B 若A和B对应位有至少一个非0, 则对应位置1
^ ^ 按位异或 A^B A^B 若A和B对应位有且仅有一个非0, 则对应位置1; 否则, 则对应位置0; 不同为1, 相同为0
~ ~ 按位取反 ~A ~A 每一位若为0, 则置1; 若非0, 则置0
2.1.2.1.2.7.7 Overflow

计算机操作的虽然是二进制数, 但是它的容量却是有限的, 而不能像数学中可以表示理想的无限大整数.

因此, 当数的大小超出了计算机可以表示的范围, 就发生了 "溢出". 在大多数的计算机中, 当发生了溢出, 溢出位会被抛弃, 而只给出一个是否曾发生了溢出的标记.

绝大多数时候, 我们会选择尽可能的避免溢出的发生, 因为它会导致运算结果不符合预期. 因此, 当定义变量的时候, 需要提前估算数据的范围, 为不同的数据选用不同的类型.

但是溢出并不总是坏事, 有时候, 它可以给我们带来一些特殊的优势. 比如著名的 "雷神之锤 III" 平方根倒数速算法, 就为是利用了溢出和微积分线性拟合的典例.

而我们计算机中, 对于负数的表示, 也和溢出有千丝万缕的联系.

2.1.2.1.2.7.8 2's Completion

计算机可以表示的数据是有限的, 最开始, 一块 CPU 只能计算8位二进制数, 那非常小, 只能表示 0255 之间的数据. 后来, 直到现在, 计算机也只能表示64位的数据. 当我们只考虑正数的时候, 它并不会出现很大的问题, 在整数范围内, 直接相加即可得到所需的结果. 即便是两数相加发生溢出了, 也可以相对简单的解决.

但是, 当需要考虑负数的时候, 情况就开始不一样起来了. 我们开始必须找到一种方式, 来区分一个数是正数还是负数.

最朴素的想法是, 我们舍弃一位的表示范围, 将这一位用于区分数的正负性. 于是, 我们就有了 "整数的原码表示" (Origin).

在我们需要表示的数值为正时, 原码与真值 (True Value) 相同. 而当需要表示负数的时候, 最高位会被写作1. 也就是说, 将最高位作为符号位, 记录数据是正还是负.

原码表示在数学运算中会导致非常大的问题, 因为, 负数参与运算时, 最高位为1, 与正数进行二进制加法, 可能会得到不正确的结果 — 一个更大的负数.

    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)
    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)

所以, 对于一个涉及到负数的运算, 不能直接采用通常的二进制原码表示, 简单的将负数的最高位置为1.

理想的负数表示, 需要保证运算完成后, 可以使得负数与对应正数相加值位0 (最高位产生1位溢出).

于是, 为了达成这样的结果, 我们选择将数值部分原样取反 这样就得到了 "反码" (1's Completion).

但是反码有同样的问题, 虽然可以避免正负数相加得到更大的负数, 但是一个正数, 和对应的负数相加, 得到的却不是原始的0, 而是全1, 这就会造成 +00 的问题.

    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)
    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)

于是, 既然相等负数相加不为0, 那么干脆给它补一个1, 将反码运算中的结果加上一个1, 再经过溢出处理, 最后的结果就是我们想要的真正的0.

为了实用, 将这个1, 加入到反码表示中. 于是, 我们就得到了 "补码" (2's Completion).

当然, 这是实践可以得出的结论, 补码实际上有它更深层次的意义.

2.1.2.1.2.7.9 N's Completion

N的补码, 实际上是模N剩余类加群, 对于

𝑍𝑛=𝑍mod𝑛(𝑍,mod)

, 满足封闭性, 结合性, 则有Z上的模N剩余群.

给定一个n, 有n个模n剩余类, 且有 a, b 满足 gcd(𝑛,𝑎)=1,𝑎×𝑟𝑖+𝑏, 构成模n完全剩余系.

对于𝑛𝑛, 有𝑏=𝑛𝑎𝑎+𝑏=0, 若定义 𝑎𝑛1, 存在负数与对应正数模n同余, 则n为互补常量.

𝑎=𝑎的加法逆元, 则, 对 𝑀 求补有 𝑎=𝑀𝑎,𝑀=10𝑛, 对于 M M 0=𝑀,0=0, 在 𝑀2 上同余.

2.1.2.1.2.7.10 Bitwise Shift

Apart from regular bitwise operations, we have some special ones as well. Could you image that every digit of a numbers can be shift?

We have mentioned float point numbers before already, right? You may think that float point can be seen as shift of digits. But actually, the float point numbers just move the position of decimal point.

In bitwise shift operations, the decimal point will be fixed in #0. #0. . And, move all digits directly right or left.

  • Logical Shift Right: Shift all digits right based on 0 position. Every number outside 0 will be discarded. Padding higher position with 0.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
  • Mathematical Shift Right: Mostly same as logical shift right operation, but padding higher position based on sign bit.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...

    For positive numbers, exactly like logical ones.

     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...
     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...

    For negative ones, padding number will be 1 instead.

  • Shift Left: Shift all digits left based on highest position. Every number over highest limit will be discarded. Padding 0 position with 0.

       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
Operations Description Form Comment
<< << SHL A << B A << B
>> >> SHR A >> B A >> B Different machine may choose different SHR method, Logical or mathematical

Give a brief knowledge of bitwise shift operations here. You may find that, shift operations just do multiplication and division indeed.

How?

Actually, SHL SHL are some number multiple 2𝑛. SHR SHR are some number division 2𝑛.

And all discarded numbers are seen as overflow.

2.1.2.1.2.8 Syntax

C语言, 实际上, 作为一种和计算机进行沟通交流的语言, 实际上也有自己的一套语法规范.

在前面几节中, 我们也看到了, 如果没有按照它的语法规范来书写, 就会遇见 "非法" 报错.

因此, 我们有必要系统了解一下C语言的各种语法规范.

以下是我们的示例程序:

/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}
/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}

From the program above, we can see that there are several lines that contains something we haven't met before.

We all explain them all in this chapter.

2.1.2.1.2.8.1 Statements

The first thing I'd like to tell you is definition for statement.

The c program are composed with statements, just as what we have mentioned before.

Statements define the operation the program will execute. Each statement may have do something.

According to the C Programming Language Standard, every statement in c need to end with semi-colon (';'). Unless it is listed detailed that has no necessary to have semi-colon.

For example, we can see,

  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;
  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;

they all statements.

Also, multiple statements can be written in same line. You may see this:

int i; i = 1;
int i; i = 1;

From here, we written two statements, int i; int i; , and i = 1; i = 1;

So, it is not necessary to add line feed between two different statements.

They are added for beauty and clear.

Also, because the statement termination will just be determined by semi-colon, one statement may be written in multiple lines.

int
i
=
10
;
int
i
=
10
;

They are legal as well.

But, we'll not write code in this way. More common usage of this feature will be:

int i = 10,
    j = 20;
int i = 10,
    j = 20;
2.1.2.1.2.8.2 Expression

As we have known statement, another import part of c program is expression.

From which, a expression is some form that contains different operation.

Most basic expression we'd used in program are calculation.

1 + 2
i = 0
printf("Hello, World")
1 + 2
i = 0
printf("Hello, World")

They all expressions, and finally get the result of those operation.

Statements may contains expression, but expression cannot construct a statement.

Also, most of the time, a expression will generate some value, that can be used in the following program.

Furthermore, expression is able to be nested.

printf("%d", 1+1)
printf("%d", 1+1)

Here, we have two expression, the smaller one 1+1 1+1 , and the larger one, which wraps the small one, printf("%d", ~) printf("%d", ~) .

Once we add semi-colon after them, the whole expression will be a statement.

printf("%d", 1+1);
printf("%d", 1+1);

And is ready to do something particular.

You may image, as the function call is a valid expression, and can be turned into statement. The calculations, we can also add semi-colon after them, to have a statement.

1;
8*2;
1;
8*2;

But they are meaningless.

2.1.2.1.2.8.3 Code Block

When we programming, sometimes we may want to execute some operation at same time (or intend to execute them at same time).

Then, we need Code Blocks, or "compounded statements". They are Statements composed and wrapped in one large brackets. For example:

{
  int x;
  x = 1;
}
{
  int x;
  x = 1;
}

They are seen as a group, one large statement later on the rest of program.

And we need no semi-colon at the end of bracket expression.

2.1.2.1.2.8.4 Empty Lines & Space

Not only for beauty, we'll need spaces in code for distinct different syntax object.

For example, why we always need a space between int int and i i ? Because if we dropped it, the compiler will only see inti inti , which is not a valid name, or anything else.

Just like the reason why we must write space between different words. (Even in Chinese).

So, at some particular times, if we can say that, the space will not change the structure of our code, the space is able to be deleted.

Empty lines, the line which contains no code, does relative same as space. If it is not necessarily placed there, then it does only for beauty, and can be removed.

The example here points out, when can we discard the space and empty lines.

int x = 1;
// Equals to
int x=1;
int x = 1;
// Equals to
int x=1;
2.1.2.1.2.8.5 Comment

Comments are another thing that will not affect anything within our code. When compiler meets a comment, it will ignore it directly. Which means, comment will behaviour like a space in our code.

There are two ways for us to write comments.

  • /* ... */ /* ... */ : multiple line comment, but also for inline comment, anything inside /* /* and */ */ will be ignored.
  • // ... // ... : one-line comment, anything follow after will be ignored.

We can see the code above, to have a relative simple understand to comments.

2.1.2.1.2.9 Variables & Variable space

Here, we comes to the most import part of a program. We'll know what variable is, how it is defined, and operations done on them.

First of all, we'd like to see, relation between variable and value.

2.1.2.1.2.9.1 Data, Variable, Value

Data, something that represents something, carrying some information, always the object we will manipulate in program.

But how can we describe a data? We may use something called "variable", they are some slot that has desired space for storing data.

Thus, in general, variable are some space, slot, that can store some value, carrying some specified data.

2.1.2.1.2.9.2 Definition

Before we use some concrete variable in our program. We must define them.

The basic forms of variable definition are list below:

<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];
<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];

Also, we have another way to declare a variable:

extern <variable-type> <variable-name>;
extern <variable-type> <variable-name>;

From them all, we can see that, to declare a variable. We'd have to write in "type name;" form.

Where, type can be any type specifier mentioned above in types section.

Such that,

int a;
int b;
int a;
int b;

Furthermore, when we have learnt the structure, enumerator, union and function, we all have more form of types.

2.1.2.1.2.9.3 Variable Name

One must-have element of variable definition is type. And another one is variable name.

Once we have define a variable, we can then reference it using its name.

Just like you call one's name.

Variable names in c programming language must follow some rules:

  1. start with '$', '_' and alphabet,
  2. have no space inside,
  3. followed by '$', '_', alphabet, and numbers.
  4. has a total length less than 63 character.
  5. not duplicate with any other names defined before or same with keywords like 'int'.

Keywords, are some commands will reserve for special usage in c program, for example, int int , if if , continue continue . And C programming language also have some name reserved for further usage. So, for those name, although it is possible to be use, it is not encouraged to do so.

Here are some mainly used keywords and reserved names:

auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic
auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic

Outside those keywords that cannot use, we also have extra naming rules.

Names starts with two underscore ('_') and those start with one underscore and a capitalized alphabet are reserved for compiler.

Names starts with two underscore and ends with two underscore are reserved for system-wide standard library.

Names starts with one underscore and a lower-case alphabet, ends with one underscore are reserved for library.

Names all capitalized alphabet, split by underscore, meaning constants.

2.1.2.1.2.9.4 Initialize

Once you finished declaration, which doesn't means you finished the variable definition.

A variable must do initialize, and then can be put into use. Otherwise, you may get random value when you try to reference it.

First time assignment to a variable are called "initialization".

Only for that, with variable declaration and initialization, we can say we finished a variable definition.

From list above, we can see that initialization can be done together with declaration.

int a = 10;
int a = 10;
2.1.2.1.2.9.5 Assignment Operations

Assignment are some operation special to variable.

Most simple one has notation like equation equation in math. We call it assignment operation assignment operation directly.

Operations Description Form
= = Assignment A = val A = val

After program finish a assignment operation, it value store within variable will be replaced.

int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9
int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9

So, this is the meaning of "variable", a space that can store some value. And assignment operation just find those space, and then replace the value inside. Just like the drawer that can store exactly one thing. You may put one thing inside. And you may clear the drawer, and put a new one inside.

2.1.2.1.2.9.6 Composed Assignment Operations

Beyond regular assignment operation, we have some advanced ones. You may compose assignment operation with other mathematics operations. Thus, we got compound assignment operation compound assignment operation .

Operations Description Form Equivalent Form
+= += Addition Assignment A += val A += val A = (typeof(A))(A + val) A = (typeof(A))(A + val)
-= -= Subtraction Assignment A -= val A -= val A = (typeof(A))(A - val) A = (typeof(A))(A - val)
*= *= Multiplication Assignment A *= val A *= val A = (typeof(A))(A * val) A = (typeof(A))(A * val)
/= /= Division Assignment A /= val A /= val A = (typeof(A))(A / val) A = (typeof(A))(A / val)
%= %= Modulus Assignment A %= val A %= val A = (typeof(A))(A % val) A = (typeof(A))(A % val)
^= ^= Bitwise XOR Assignment A ^= val A ^= val A = (typeof(A))(A ^ val) A = (typeof(A))(A ^ val)
|= |= Bitwise OR Assignment A |= val A |= val A = (typeof(A))(A | val) A = (typeof(A))(A | val)
&= &= Bitwise AND Assignment A &= val A &= val A = (typeof(A))(A & val) A = (typeof(A))(A & val)
<<= <<= SHL Assignment A <<= val A <<= val A = (typeof(A))(A << val) A = (typeof(A))(A << val)
>>= >>= SHR Assignment A >>= val A >>= val A = (typeof(A))(A >> val) A = (typeof(A))(A >> val)

Those self-increment operation and self-decrease operations are some kind of same as addition assignment and subtraction assignment:

int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
2.1.2.1.2.10 Type Conversion

As we mentioned before, C is typed language. Each type's variable occupies different spaces.

So, to have one variable has type int int , to be used as long long , we must convert its value into type long. The way to archive this is called type convert.

In types section, we have learnt type boost type boost , this is a kind of special automatically type conversion. Auto type conversion always convert type from smaller ranges to larger. So, that's why we need force type conversion.

To convert a value's type from one to another, add type with brackets before the expression.

(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;
(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;

But force type conversion has a serious problem: it may result in resolution lack. Conversion from int int to char char , is a kind of conversion from large range to smaller range. And it will simply discard higher part of int int value. Instead of the case short short convert to int int , just put all data into lower part of int and everything is OK.

For example,

  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011
  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011

This may cause some unexpected results.

Also, conversion from real numbers to integer will also introduce same problem. All number after decimal point will be dropped directly.

2.1.2.1.2.11 Input And Output

Programs does not only calculation, but also have to tell the result. Thus input and output utilities are indispensable.

Most useful input and output function are provided by printf printf and scanf scanf function in C.

2.1.2.1.2.11.1 printf printf

printf printf , stand for "print with format", a kind of format output method.

So, basically, the function of printf printf is to display some information on screen. And advanced functions are format output string.

2.1.2.1.2.11.1.1 Output

Most basic usage of printf printf is written as following:

printf("output string")
printf("output string")

Anything inside quotations, the string delimiter, except '%', will be displayed as is.

For example, the printf printf here will print "output string" to terminal. The black-backgrounded window on your computer.

For "terminal", the name came from the hardware long long ago.

One thing you must noticed is that, example shown here is just a expression, but a statement. So, in order to make it work, you may have to add a semi-colon, ';', after whole expression.

In most case, the system will refresh output with carriage return, line feed, or both. But printf printf will never add any of which after all content have been printed. So, to let output looks normal, you need to add a new line mark at the end of string:

printf("string with new line mark at end\n")
printf("string with new line mark at end\n")

Outside end of line, new line mark can also added inside a sentence.

printf("string\nwith new line mark inside\n")
printf("string\nwith new line mark inside\n")

This may do the same as following:

printf("string\n");
printf("with new line mark inside\n");
printf("string\n");
printf("with new line mark inside\n");

(why we add semi-colon at the end of sentence? Because you will never able to written two different expression within one statement in such form)

2.1.2.1.2.11.1.2 Placeholder & format

And how about advanced functions?

The format feature is provided by placeholders. Have you ever remember I have mentioned '%' before? Percentage mark works like placeholder here, and that's why it cannot be printed directly using printf printf . The method to print out '%' into screen is done by writing '%' as "%%" in format string, the first argument provided for printf printf .

Since printf printf has the name "print with format", the placeholder must have not only the function to prevent percentage mark to be evaluated and printed. So, let us investigate more about placeholders.

As we all know, C programming language has classified data into different types. So that placeholders must have different form so that printf printf function can then distinct them. Those decorator for placeholders are called "type specifier". And a full placeholder are written according to such syntax:

<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>
<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>

Looks complex? Just quick glance and move forward, examples says more than standard:

type specifier Description Form Expected Data
a a , A A Output floats in hexadecimal %a %a Reals: float, double, double
d d Output integer in decimal %d %d Integers: char, short, int
o o Output integer in octal %o %o Integers: char, short, int
x x , X X Output integer in hexadecimal %x %x Integers: char, short, int
u u Output unsigned in octal %u %u Unsigned Integers: unsigned char, short, int
f f Output reals in decimal %f %f Reals: float
e e , E E Output reals in exponent %e %e Reals: float
g g , G G Output reals in shorter form %g %g Reals: float
c c Output Character %g %g Character: char
s s Output Character String %s %s String: char[] char[]
p p Output Address %p %p Pointer: * *

And their long version variants:

type specifier Description Form Expected Data
ld ld Output integer in decimal %ld %ld Integers: long
lo lo Output integer in octal %lo %lo Integers: long
lx lx , lX lX Output integer in hexadecimal %lx %lx Integers: long
lu lu Output unsigned in octal %lu %lu Unsigned Integers: unsigned long
lld lld Output integer in decimal %lld %lld Integers: long long
llo llo Output integer in octal %llo %llo Integers: long long
llx llx , llX llX Output integer in hexadecimal %llx %llx Integers: long long
llu llu Output unsigned long long in octal %llu %llu Unsigned Integers: unsigned long long
lf lf Output reals in decimal %lf %lf Reals: double
le le , lE lE Output reals in exponent %le %le Reals: double
lg lg , lG lG Output reals in shorter form %lg %lg Reals: double
% % Output % % %% %% None

Here are flags part:

flags Description Form Expected Data
- - Align left, default right %-d %-d None
+ + Force output '+', default not show for positive %+d %+d None
Insert a space before output % d % d None
# # Show '0', '0x' or '0X' with 'o', 'x', 'X' descriptor
force show decimal point with 'e', 'E', 'f'
or, not remove tailed zero with 'g', 'G'
%#d %#d None
0 0 Padding 0 instead of space %0d %0d None

Width, .precision and length:

flags Description Form Expected Data
(number) (number) minimal number of character to print, padding with space, if output longer than this value, output will not be truncated %8d %8d None
* * width not specified in format string, but obtained as parameter before argument to be formatted %*d %*d Integer: char, short, int
.number .number for integers (d, i, o, u, x, X): minimal digits to be written, less than this value will padding by 0. Longer than this value will affect nothing. 0 means nothing to print
for e, E, f: digits after decimal point
for g, G: maximal digits to be printed
s: maximal length of a sting, default, all character will be printed, until '0'
c: nothing affected
nothing placed will introduce a 1
%.10d %.f %.10d %.f None
.* .* precision not specified, but obtained as parameter before argument to be formatted %.10d %.f %.10d %.f Integer: char, short, int
h h parameter as short, for i, d, o, u, x, X %hd %hd None
l l parameter as long, for i, d, o, u, x, X
double, for f
wide char, for c
wchar string, for s
%ld %ld None
ll ll parameter as long long, for i, d, o, u, x, X
long double, for e, E, f, g, G
%lld %lld None
L L parameter as long long, for e, E, f, g, G
parameter as long long, for i, d, o, u, x, X
%Lf %Lf None

And prinf prinf will return total character it printed.

You may able to print ASCII code using printf printf now:

#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}
#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}

Definition of printf printf function is written as:

int printf(const char * fmt, ...);
int printf(const char * fmt, ...);

So, you can call it using the form:

printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
2.1.2.1.2.11.2 scanf scanf

Once we learnt output part, it is also necessary to have a glance to input part.

The usage of scanf scanf is roughly like to printf printf , except function calling methods. Scanf Scanf stands for "Scan from format", so, it necessarily needs placeholder as printf printf .

Placeholders are written in this form:

<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>
<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>

Some kind of like to printf printf , right?

part Description Form Expected Data
* * * stand for discard input, or, simply skip data match the type %*d %*d None
width maximum character to be read %8d %8d None
modifiers decorator for type specifier like printf printf %ld %ld None
type data to be scan as %d %d None
part Description Form Expected Data
a a , A A floats scanf("%a", &f) scanf("%a", &f) floats
c c characters, if width is not 0, read width character and set to parameter scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) char
d d integer written in decimal, '+' or '-' are optional scanf("%d", &i) scanf("%d", &i) int
ld ld integer written in decimal, '+' or '-' are optional scanf("%ld", &l) scanf("%ld", &l) long
lld lld integer written in decimal, '+' or '-' are optional scanf("%lld", &ll) scanf("%lld", &ll) long long
e e , E E , f f , F F , g g , G G real numbers, '+' or '-' are optional, 'e' for exponent are optional scanf("%f", &f) scanf("%f", &f) float
i i integer scanf("%i", &i) scanf("%i", &i) int
o o integer written octal scanf("%o", &i) scanf("%o", &i) int
s s string, separated by blanks scanf("%s", s) scanf("%s", s) char[] char[]
u u unsigned int scanf("%u", &u) scanf("%u", &u) unsigned int
x x , X X int written in hexadecimal scanf("%x", &i) scanf("%x", &i) int
p p pointer scanf("%p", &p) scanf("%p", &p) * *
[] [] ranges, simplified regular expression scanf("%[1-9]", &c) scanf("%[1-9]", &c) char
% % % % scanf("%%") scanf("%%") None

Sample question: A+B Problem:

#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
2.1.2.1.2.12 Conditional Statement

Since the program is not only tool to calculating, it also helps people to solve problems require decision.

So, scientists introduces conditional statement. They can decide what to do according to conditions.

2.1.2.1.2.12.1 If

If statement has form of:

if (condition) statement
if (condition) statement

When condition expression part evaluated with true, then statement part will be executed.

if (x < y)
  printf("x less than y");
if (x < y)
  printf("x less than y");

You can see, x < y x < y is condition expression, and if x indeed less than y, the program will output the information.

But this is only the simplest case, what if we want to execute multiple statement within if statement?

Remember code block? Code block can compose different statements together. So:

if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}
if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}

Here, we execute two statements when x larger than current max value.

2.1.2.1.2.12.2 If-Else

Instead of just "if" statement, sometimes we may need "else" part.

if (condition)
  then-statement
else
  else-statement
if (condition)
  then-statement
else
  else-statement

Just similar to if statements, when condition is not 0, or, acceptable, execute then-statement, else, execute else-statement.

Also, you may find some case, you may classify different case, so you can written then like this:

if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement
if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement

This is simply nested if-else statements for each "else if" are new if statement place in else part of further one. This is for beauty, but you can also write like this:

if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}
if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}

Very clear.

2.1.2.1.2.12.3 Ternary if-else operator

三元运算符

Though in most case, if-else statements is enough, it is still the statement but a expression. Thus in some corner condition, written using if-else may result in more lines of code and complexity.

Thus we introduces ternary if-else operator. With this operator, you got a expression, so you can than combine them together with other expressions.

Ternary if-else looks like this

condition ? then : else
condition ? then : else

when condition is true, then part will be executed, and if condition is false, else part will be evaluated. And finally, the value of expression will be return.

So, you may write:

int i = 10;
i = i - 100 < 0 ? 0 : i - 100;
int i = 10;
i = i - 100 < 0 ? 0 : i - 100;

or, in c++, you may found you can write like this: (we must mention c++ here for clear because this style of ternary is indeed not allowed to be written in pure c, but most of programmers may not distinct c/c++)

int i = 0;
int j = 10;
(i < j ? i : j) = 1;
int i = 0;
int j = 10;
(i < j ? i : j) = 1;

(the second case is correct because every operation in c++ are special methods(functions), so = is actually a function call, equivalent style is int::operator=(i< j ? i : j, 1); int::operator=(i< j ? i : j, 1); )

They all correct, but second one is not encouraged to use.

2.1.2.1.2.12.4 Switch-Case

Addition to if-else statement, we also have switch-case statements.

switch (object) {
  case label:
    statements
  case label:
  ...
}
switch (object) {
  case label:
    statements
  case label:
  ...
}

Label can be one of "case literal-value" or "default", and it is not necessary to add brackets if you have multiple statements in one case. Each label means an entry, when object matches label, it will execute start from the position of label, until meets break statements break statements

Then, a legal switch-case statements may look like:

int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
2.1.2.1.2.12.4.1 Break statement

But what does break statement do?

Break statements has two variants. One is here, break statements used to jump out of the switch case statements' execution sequence.

When c finds object matches the label, and it will execute each statements after the label until meets end bracket, but in some case, actually, most case, you may not want it to do so. So, break can break whole process, when it executed break statements, it will simply jump out of switch-case statements, and rest statements inside will not be executed.

Though break statements in switch-case is not mandatory, but it is a good habit to add break for each label.

2.1.2.1.2.13 Loop

What if you want to execute multiple, same, or equivalent same statements? Here we needs loop.

Loop are some statements can execute other statements repeatedly according to some condition.

2.1.2.1.2.13.1 While

While loop looks similar to if statement,

while (condition)
  loop-body
while (condition)
  loop-body

and works similar to if statement as well. When condition is true, then loop-body will be executed.

Furthermore, most similar part between while loop and if statement is that body of loop has still single statement. If you want multiple statements to be evaluated, you must add brackets.

while (1) {
  printf("infinity loop\n");
}
while (1) {
  printf("infinity loop\n");
}
2.1.2.1.2.13.2 For

For loop is another type of loop, it may not that clear to have the name "for",

for (initial; condition; update)
  loop-body
for (initial; condition; update)
  loop-body

for loop always have four part.

Initial part give the ability to define loop variable and initialize them inside the loop. Condition part is same as while loop, if it is true, then body executed, else, just break the process. Loop-body, still, same as if and while loop, execute if everything OK. And finally, update, when loop-body finished, the for loop will do update, to update loop variable.

for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}
for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}

Another important part is that, for totally four part of for loop, initial initial , condition condition , and update update parts can be empty. Thus, you may find in some special case,

for (;;)
  body
for (;;)
  body

can be seen as infinity loop.

2.1.2.1.2.13.3 Do-While

But what if we need to execute body at least once?

Then we need do-while loop.

do {
  body
} while (condition);
do {
  body
} while (condition);

Apart form other statements, do-while loop requires brackets compulsory.

2.1.2.1.2.13.4 Break

Still break, the other form of break is here, when break statement used within the body of loops, it will jump out of whole loop. Discard anything after break. Even update part of for loop.

Similar to switch-case.

2.1.2.1.2.13.5 Continue

Sometimes, you may need to just skip rest of part in body, but not jump out of loop, then you needs continue statement.

When continue executed, it will just go to another round of loop, do update, test condition, and new execution process of body.

2.1.2.1.2.14 Array

When we are dealing with small scale of data, define multiple variables is enough, but how about sequence of data?

For example, read scores of over 500 students and sort them.

In contrast, average and maximum can be done with only one or two variables, but this requires store all information.

Arrays are linear and continuous data structure for storing same type values.

Definition for one-dimension array written as following:

type name[length];
type name[length];

And further, array can be multiple-dimension.

type name[length][length];
type name[length][length][length];
...
type name[length][length];
type name[length][length][length];
...

Once we define an array, then it has length elements stored, you may visit them using index:

name[idx];
name[idx];

each element can be seen as a regular variable whose type is same as type used to define whole array.

And we can then traversal array using loop:

int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}
int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}

Then, how can we initialize an array?

There are two main ways:

type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...
type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...

One is not write length, but just wrap initial values using brackets, the final array will have the length of total count of initial values. The other way is to specify length, and also provide initial value wrapped using brackets.

For multiple-dimension arrays, you must specify other dimension length except first one, and you can write initial values directly in one pair of brackets, but also, spare each dimension array elements using different brackets pair.

2.1.2.1.2.14.1 C Style String

Finally, we come to string part.

As we mentioned before, string and character has some special relationship. Actually, strings in c programming language are array of char.

In C programming language, it will treat char array end with '0' as a string.

2.1.2.1.2.15 sizeof sizeof

Though it is possible to traversal arrays using literals. It is not that convenient.

To simplify operation, we can use sizeof sizeof operator:

sizeof(type)
sizeof(variable)
sizeof(array)
sizeof(type)
sizeof(variable)
sizeof(array)

sizeof sizeof operator will return the total length of target type/variable/array in bytes. So, to have the length of array, we can say that:

int len = sizeof(array) / sizeof(type);
int len = sizeof(array) / sizeof(type);
2.1.2.1.2.16 Iterator

To traversal arrays, using idx idx traversal variable is one possible method. The other way to archive the goal is using iterator.

int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}
int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}

here, we defined p as iterator for array a. And then, it is able to iterate whole array.

The p here is called, pointer points to int.

More detail will be covered in Pointers section.

2.1.2.1.2.17 Function

Function, a kind of contract, accepts some input and generate outputs. Most similar to their mathematical form, any same input provide for a function will result in same output. Furthermore, the format of function is almost same as that in math:

int func(int R);
int func(int R);

You may assume it as: function 𝑓:𝑁𝑁 or 𝑓(𝑥)𝑁,𝑥𝑁 And

float func(float a, float b);
float func(float a, float b);

may represents function 𝑓:𝑅,𝑅𝑅 for 𝑓(𝑣)𝑅,𝑣=𝑎,𝑏,𝑎,𝑏𝑅.

Formally, input in C programming language can be zero or more parameters. And output are something so called "return value". There may exists more way to pass output value other than regular returning method.

Ideally, a function may not affect anything outside itself, this kind of function are seen as pure functional function. But, in normal program, they may need to perform operations other than calculation. For example, I/O. Any operation modify memory, variables outside its own scope, or perform I/O, are defined as side effects of a function.

More particularly, some function in C programming language may have even no returning but side-effects.

2.1.2.1.2.17.1 Definition

To brief understand function in c, first look at the function definition.

Function definition does almost same as variable declaration, but the main purpose it to tell the compiler about a function's name, return type and its parameters, rather than allocate a new space indeed.

We call it prototype.

<return-type> <function-name>(<parameters> ...);
<return-type> <function-name>(<parameters> ...);

Usually, prototype are placed within headers.

For example, you may have prototype for function add add that generate sum of two integer like:

int add (int a, int b);
int add (int a, int b);

Here we declare the function add, which accepts two arguments, corresponding to parameters a, and b respectively.

And then, as variables must initialized before referenced. Functions must have finish implementation before being called.

Function implementation roughly like declaration, but with extra function body part:

<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}
<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}

Body part may be regular statements, but also possible for return return statement.

Purpose of return return statement is tell the program, which value are seen as return value of the function.

Like equation mark in 𝑓(𝑥,𝑦)=𝑥+𝑦.

Here we implement function add add :

int add (int a, int b) {
  return a + b;
}
int add (int a, int b) {
  return a + b;
}
2.1.2.1.2.17.2 Function Calling

Once a function has been defined, it can be used in our program with function call syntax.

As we mentioned very early at the beginning of our tutorial, a function call is written in such form:

<function-name> (<arguments> ...)
<function-name> (<arguments> ...)

And arguments must match parameter in order and type.

For example, if we have a function add defined before,

int add(int a, int b){
  return a + b;
}
int add(int a, int b){
  return a + b;
}

Then we can use it like:

#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}

first argument we provide for add add is integer variable a, which has the same type as parameter a a , and second argument is literal value 20 20 , since any integer literal without suffix will be seen as integer in c, it has also same type with parameter b b . Thus, the function call is acceptable.

But what if we provide arguments less, more, or even has type mismatch? The C programming language will complain about syntax error.

2.1.2.1.2.17.3 Recursion

Since a function can be called within body of other functions, it make nonsense to prevent a function calling it self.

A function that calling it self are called recursion function.

For example, factorial function can be defined using recursion:

int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}
int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

The basic structure of recursion function is similar to normal function, the only difference is that it calls itself within its body.

But since recursion function may call itself infinite times, it must have a terminal condition to stop further calls.

Here the if statement works as terminal condition. When n equals to 0, the function will return 1 directly, without further calling itself.

2.1.2.1.2.17.4 Function Tail Call Optimization

In some case, a function's last operation is calling another function, which is called tail call.

And if a function's last operation is calling itself, it is called tail recursion.

In most case, a infinite tail recursion will result in stack overflow, but with tail call optimization, the compiler can optimize tail calls to avoid the case.

The common way to implement tail call optimization is Continuous Passing Style.

2.1.2.1.2.17.4.1 Continuous Passing Style

Continuous Passing Style (CPS) is a style of programming where control is passed explicitly in the form of a continuation.

2.1.2.1.2.18 Assembly
2.1.2.1.2.18.1 Architecture
2.1.2.1.2.18.1.1 AMD64 (x86_64)
2.1.2.1.2.18.1.2 Aarch64 / arm64
2.1.2.1.2.18.1.3 MIPS / Loong
2.1.2.1.2.18.2 BUS
2.1.2.1.2.18.2.1 Bridges
2.1.2.1.2.18.3 CPU
2.1.2.1.2.18.4 Intel Syntax, AT&T Syntax
2.1.2.1.2.18.5 Memory Access
2.1.2.1.2.18.6 Commands
2.1.2.1.2.18.7 Direct Memory Access
2.1.2.1.2.19 Stack
2.1.2.1.2.19.1 Frames
2.1.2.1.2.19.2 Stack Variables, Local Variables
2.1.2.1.2.19.3 Recursion Function Expansion
2.1.2.1.2.20 Global Variables
2.1.2.1.2.21 Variable Scope
2.1.2.1.2.21.1 Dynamic Scope
2.1.2.1.2.21.2 Lexical Scope
2.1.2.1.2.21.2.1 Function Scope
2.1.2.1.2.21.2.2 Block Scope
2.1.2.1.2.22 Closure
2.1.2.1.2.23 Heap Space
2.1.2.1.2.23.1 Variable Allocation
2.1.2.1.2.24 Memory Management
2.1.2.1.2.24.1 Virtual Memory (OS)
2.1.2.1.2.25 Function Call
2.1.2.1.2.25.1 Function Stack
2.1.2.1.2.25.2 Function In Assembly
2.1.2.1.2.26 goto goto
2.1.2.1.2.27 User Defined Types
2.1.2.1.2.27.1 Struct Struct
2.1.2.1.2.27.1.1 Bit Field
2.1.2.1.2.27.1.2 Simulate class class Using Structure
2.1.2.1.2.27.1.3 Virtual Function Table
2.1.2.1.2.27.2 Enum Enum
2.1.2.1.2.27.3 Union Union
2.1.2.1.2.28 Structure space, Memory Alignment & Offset
2.1.2.1.2.29 Pointers
2.1.2.1.2.29.1 Pointer offset, index & linked list
2.1.2.1.2.29.2 Array, Pointers Points To Continuous Memory
2.1.2.1.2.29.3 Function pointers
2.1.2.1.2.29.3.1 Form
2.1.2.1.2.29.3.2 Function As Function Pointer
2.1.2.1.2.29.3.3 Calling With Function Pointer
2.1.2.1.2.29.3.4 Simplified Function Call
2.1.2.1.2.29.4 Void Pointers
2.1.2.1.2.29.5 Pointer Convert
2.1.2.1.2.30 Pointer in Assembly
2.1.2.1.2.31 Exception
2.1.2.1.2.31.1 setjump setjump , longjump longjump
2.1.2.1.2.31.2 Try-Catch, Throw
2.1.2.1.2.31.3 Seh, Structure exception handler
2.1.2.1.2.31.4 Herbexception
2.1.2.1.2.31.5 Exception spread
2.1.2.1.2.31.6 Condition System
2.1.2.1.2.31.7 Continuous
2.1.2.1.2.32 Preprocessor
2.1.2.1.2.32.1 Header files, #include #include
2.1.2.1.2.32.2 Macro
2.1.2.1.2.32.2.1 C Style Macro
2.1.2.1.2.32.2.2 M4 Macro Language
2.1.2.1.2.32.2.3 C++ Template
2.1.2.1.2.32.2.4 Rust Procedure Macro
2.1.2.1.2.32.2.5 Rust Macro Rules
2.1.2.1.2.32.2.6 Macro Assembly, Pseudocode
2.1.2.1.2.32.2.7 Common Lisp Expansion Macro
2.1.2.1.2.32.2.8 Common Lisp Reader Macro
2.1.2.1.2.32.2.9 Scheme Hygiene Macro System
2.1.2.1.2.32.2.10 Scheme Syntax Rules
2.1.2.1.2.32.2.11 Scheme Syntax Case
2.1.2.1.2.32.2.12 Hygiene for the Unhygienic
2.1.2.1.2.32.3 Compiler Comments
2.1.2.1.2.32.4 #progma #progma
2.1.2.1.2.33 Meta-programming
2.1.2.1.2.34 Compiler
2.1.2.1.2.34.1 Compile Process
2.1.2.1.2.34.2 Compiler Driver
2.1.2.1.2.34.3 Assembler
2.1.2.1.2.34.4 Assemble
2.1.2.1.2.34.5 Assembly Code
2.1.2.1.2.34.6 Linker
2.1.2.1.2.34.7 Link
2.1.2.1.2.35 Executable File
2.1.2.1.2.35.1 Object
2.1.2.1.2.35.2 Executable
2.1.2.1.2.35.3 Executable File Format
2.1.2.1.2.35.3.1 Portable Executable (PE)
2.1.2.1.2.35.3.2 Executable Linkable Format (ELF)
2.1.2.1.2.35.3.3 Mach-5 (Fat-5)
2.1.2.1.2.35.3.4 Common Object File Format (COFF)
2.1.2.1.2.35.3.5 Binary (Bin)
2.1.2.1.2.36 ABI
2.1.2.1.2.36.1 Function Call Conventions
2.1.2.1.2.36.1.1 __cdecl __cdecl
2.1.2.1.2.36.1.2 __stdcall __stdcall
2.1.2.1.2.36.1.3 __fastcall __fastcall
2.1.2.1.2.36.1.4 thiscall thiscall
2.1.2.1.2.36.1.5 Microsoft 4-register fastcall __vectorcall __vectorcall
2.1.2.1.2.36.1.6 System V ABI syscall
2.1.2.1.2.36.2 Function Naming Convention
2.1.2.1.2.36.2.1 C Function Naming Convention
2.1.2.1.2.36.2.2 MSVC C++ Function Naming Convention
2.1.2.1.2.36.2.3 Rust Function Naming Convention
2.1.2.1.2.36.2.4 Common Lisp Naming Convention
2.1.2.1.2.36.3 Endian
2.1.2.1.2.36.4 Dynamic Linked Library
2.1.2.1.2.36.5 Static Linked Library
2.1.2.1.2.36.6 fPIE, fPIC
2.1.2.1.2.37 Multiple File Compile
2.1.2.1.2.37.1 Compile Unit
2.1.2.1.2.37.2 Object
2.1.2.1.2.38 Build Systems
2.1.2.1.2.38.1 C Project Management
2.1.2.1.2.38.2 Makefiles
2.1.2.1.2.38.3 AutoTools
2.1.2.1.2.38.4 CMake
2.1.2.1.2.38.5 VSXMake (VSProj)
2.1.2.1.2.38.6 XMake
2.1.2.1.2.39 Variable Decorator
2.1.2.1.2.40 asm volatile (assembly code : output operands : input operands : clobbers) asm volatile (assembly code : output operands : input operands : clobbers)
2.1.2.1.2.41 __attribute__((attribute)) __attribute__((attribute))
2.1.2.1.2.42 _Generic _Generic
2.1.2.1.2.43 ..., va_start, va_arg, va_end ..., va_start, va_arg, va_end Macro, stdarg.h
2.1.2.1.2.44 __VA_ARGS__ __VA_ARGS__
2.1.2.1.2.45 Variable Length Array
2.1.2.1.2.46 ASCII, EBCDIC, Unicode/UCS-II
2.1.2.1.3  From The C Programming Language To Theoretical Computer Science (Section II) [S2]
2.1.2.1.3.1 From the C programming language to Theoretical Computer Science
2.1.2.1.3.1.1 Object-Oriented Programming
2.1.2.1.3.1.2 Generic Types
2.1.2.1.3.1.2.1 Template
2.1.2.1.3.1.2.2 Types Erase
2.1.2.1.3.1.3 Inheritance
2.1.2.1.3.1.3.1 Class Object
2.1.2.1.3.1.3.2 Prototype Chain
2.1.2.1.3.1.4 Polymorphism
2.1.2.1.3.1.4.1 Interface
2.1.2.1.3.1.4.2 Trait
2.1.2.1.3.1.4.3 Duck Type
2.1.2.1.3.1.5 Encapsulation
2.1.2.1.3.1.5.1 Accessibility
2.1.2.1.3.1.6 Object System
2.1.2.1.3.1.6.1
2.1.2.1.3.1.7 Turning Machine
2.1.2.1.3.1.8 Lambda Calculus
2.1.2.1.3.1.9 First Order Function
2.1.2.1.3.1.9.1 Church numeral
2.1.2.1.3.1.10 Formal Verification
2.1.2.1.3.1.11
2.1.2.2 computation theory
2.1.2.2.1  MIT 18.404j Theory of Computation (junior) [S1]
2.1.2.2.1.1 Applications
2.1.2.2.1.2 Modules of computation

Capture important aspect of thing we try to understand.

2.1.2.2.1.2.1 Finite Automata

Use less memory with limited ability of computation.

Each have different

  • Stats: 𝑞1,𝑞2,𝑞3
  • Transitions: 1
  • Start State:
  • Accepted state:

Give finite string as input, and have output of accepted or reject.

Begin at start state, read input symbols, follow corresponding transitions, Accept if end with accept state, Reject if not.

We say that "M_1 accepts exactly those string in A where 𝐴={𝑤|𝑤 contains substing 11}". And, we have A that is the language accepted by the language 𝐿(𝑀1). 𝑀1 recognize A and 𝐴=𝐿(𝑀1).

2.1.2.2.1.2.1.1 Define a finite automation

Defn: A finite automaton M is a 5-tuple (𝑄,Σ,𝛿,𝑞0,𝐹):

  • Q: finite set of states
  • Σ: finite set of alphabet symbols
  • 𝛿: transition function 𝛿:𝑄×Σ𝑄 𝛿, somehow is, a kind of relation, give a state and a accepted symbol, then returns a (maybe) new state. Eg. 𝛿(𝑞,𝑎)=𝑟
  • 𝑞0: start state
  • 𝐹: set of accept states

For example above:

  • 𝑀1=(𝑄,Σ,𝛿,𝑞1,𝐹),
  • 𝑄={𝑞1,𝑞2,𝑞3},
  • Σ={0,1},
  • 𝐹={𝑞3}.

And have:

2.1.2.2.1.2.1.2 String and languages
  • A string (word) is a finite sequence of symbols in Σ (alphabet),
  • A language is a set of strings (finite or infinite),
  • A empty string 𝜀 is a string of length 0
  • The empty language is the set with no strings.

Defn: M accepts string 𝑤=𝑤1𝑤2𝑤𝑛 each 𝑤𝑖Σ if there is a sequence of states 𝑟1,𝑟2,𝑟𝑛𝑄 where:

  • 𝑟0=𝑞0, state sequence starts at initial state,
  • 𝑟𝑖=𝛿(𝑟𝑖1,𝑤𝑖) for 𝑖𝑖𝑛, each state transition from previous one defined by transition functions,
  • 𝑟𝑛𝐹, whole sequence must be accepted.

Recognizing languages:

  • 𝐿(𝑀)={𝑤|𝑀 accepts 𝑤},
  • 𝐿(𝑀) is the language of 𝑀
  • M recognizes L(M)

Every machine can accept many words, but only one language.

Define: a language is regular if some finite automaton recognizes it.

2.1.2.2.1.2.1.3 Regular Languages

𝐿(𝑀1)={𝑤|𝑤 contains substing 11}=𝐴

2.1.2.2.1.3 Regular Expressions
2.1.2.2.1.3.1 Regular Operations

Let A, B be languages:

  • Union: 𝐴𝐵={𝑤|𝑤𝐴𝑤𝐵},
  • Concatenation: 𝐴𝐵={𝑥𝑦|𝑥𝐴𝑦𝐵}=𝐴𝐵,
  • Kleene Star: Unary operation: 𝐴={𝑥1𝑥𝑘| each 𝑥𝑖𝐴 for 𝑘0},𝜀𝐴

Note., empty language won't accept empty string, but Kleene star of empty language will.

2.1.2.2.1.3.2 Regular expression

Like mathematical expression comes from combination of mathematical operations and mathematical elements, regular expression comes form combination of regular operations and languages.

  • Built form Σ (Alphabet), members Σ, (Empty language), 𝜀 (empty word), [atomic]
  • Using , , , [Composite]

E.g., (01)=Σ gives all strings over Σ.

Finite automata equivalent to regular expressions.

2.1.2.2.1.4 Closure Properties for regular languages

If some set are closed under some operation, which means after applying those operations on objects, the result will still leave in the same class of objects.

2.1.2.2.1.4.1 Union:

If 𝐴1,𝐴2 are regular languages, so is 𝐴1𝐴2 (closure under )

Proof: let 𝑀1=(𝑄1,Σ,𝛿1,𝑞1,𝐹1) recognize 𝐴1,
and 𝑀2=(𝑄2,Σ,𝛿2,𝑞2,𝐹2) recognize 𝐴2.
Assuming 𝑀=(𝑄,Σ,𝛿,𝑄0,𝐹) recognize (𝐴1𝐴2),
𝑀 should accept input 𝑤 if either 𝑀1 or 𝑀2 accept 𝑤.

Compose 𝑀1 and 𝑀2 together, then components of M: 𝑄=𝑄1×𝑄2={(𝑞1,𝑞2)|𝑞1𝑄1 and 𝑞2𝑄2}, 𝑞0=(𝑞1,𝑞2) And, 𝛿((𝑞,𝑟),𝑎)=(𝛿1(𝑞,𝑎),𝛿2(𝑟,𝑎)) 𝐹=(𝐹1×𝑄2)(𝑄1×𝐹2)

Note., if 𝐹=𝐹1×𝐹2, then it could be closure under intersection.

2.1.2.2.1.4.2 Concatenation:

If 𝐴1,𝐴2 are regular languages, so is 𝐴1𝐴2 (closure under )

Assuming 𝑀 accept input 𝑤, if 𝑤=𝑥𝑦 where, 𝑀1 accepts 𝑥 and 𝑀2 accepts 𝑦 But failed.

Proof: Let 𝑀1=(𝑄1,Σ,𝛿1,𝑞1,𝐹1) recognize 𝐴1, and 𝑀2=(𝑄2,Σ,𝛿2,𝑞2,𝐹2) recognize 𝐴2. Construct 𝑀=(𝑄,Σ,𝛿,𝑞0,𝐹) recognize (𝐴1𝐴2).

Then the machine 𝑀 should accept input 𝑤 if there is a split of w into 𝑥𝑦 where 𝑀1 accepts 𝑥 and 𝑀2 accepts 𝑦.

And then construct M:

If there are input word 𝑤, then there should be a split point where 𝑀1 reach accept state and jump to 𝑀2 via 𝜀 transition.

Construct a new machine, concatenating 𝑀1 and 𝑀2 together with 𝜀 transitions from each accept state of 𝑀1 to the start state of 𝑀2.

But the first place machine reach accept state may not be the correct split point. M need to have a idea of all possible split points.

2.1.2.2.1.5 Non-determinism

It is mostly same as deterministic finite automaton, In deterministic finite automaton, there is exactly one transition for each state and input symbol pair.

The non-deterministic finite automaton may have different transitions for same state and input symbol pair, and this is so called non-determinism.

You may have one transition to go to one state, or another transition to go to another state.

It is also able to have epsilon transitions, which means it can go to another state without consuming any input symbol.

For non-deterministic finite automaton, it can accept inputs if some paths leads to accept states. If there is one finite machine, accept always prior to reject. The only possible reject state is when all possible paths lead to non-accept states.

The possible status of a non-deterministic finite automaton can form a tree structure. Since at each state, there may be multiple possible transitions for same input symbol.

E.g., for input "ab" for given automaton above, possible status can be:

Any way that leads to accept state is accepted.

For "aa", it will never reaches accept state.

2.1.2.2.1.5.1 NFA

Defn: A nondeterministic finite automaton, 5-tuple (𝑄,Σ,𝛿,𝑞0,𝐹):

  • Q: finite set of states
  • Σ: finite set of alphabet symbols
  • 𝛿: transition function 𝛿:𝑄×Σ𝜀(Σ{𝜀})𝑃(𝑄)={𝑅|𝑅𝑄} 𝛿, a kind of relation, give a state and a accepted symbol (or epsilon), then returns a set of (maybe) new states. Eg. 𝛿(𝑞,𝑎)={𝑟,𝑠}
  • 𝑞0: start state
  • 𝐹: set of accept states

E.g., in above example:

  • 𝛿(𝑞1,𝑎)={𝑞1,𝑞2}
  • 𝛿(𝑞1,𝑏)=

Computation processes of NFA is a kind of BFS:

  • Every time the machine read an input symbol, it will branch out to all possible next states.
  • Every time the machine find a accept state, it will accept the input immediately. Which discard all other possible paths.

Or, you may image the machine can make good guesses at each step, which always choose the correct transition to reach accept state if there is one.

2.1.2.2.1.6 NFA and DFA equivalence

NFA and DFA are equivalent in power, which means any language recognized by NFA can also be recognized by DFA, and vice versa.

2.1.2.2.1.6.1 NFA to DFA

Theorem: If an NFA recognizes a language L, then L is regular.

Proof: Let NFA 𝑀=<𝑄,Σ,𝛿,𝑞0,𝐹> recognize L.
Construct DFA 𝑀=<𝑄,Σ,𝛿,𝑞0,𝐹>

Basically, DFA 𝑀 keeps track of the subset of states in NFA 𝑀. Simulate the processes of NFA, every time the symbol is read, DFA 𝑀 update its state to the set of possible states that NFA 𝑀 may reach.

The way to archieve this is to set a state for every possible subset of states in NFA 𝑀. For each state in DFA 𝑀, which is a possible subset of states in NFA 𝑀, remember which subset of states NFA in.

Construction of DFA 𝑀:

  • 𝑄=𝑃(𝑄)={𝑅|𝑅𝑄}, the set of all possible subsets of states in NFA 𝑀.
  • 𝛿(𝑅,𝑎)={𝑞|𝑞𝛿(𝑟,𝑎) for some 𝑟𝑅},𝑅𝑄
  • 𝑞0={𝑞0}
  • 𝐹={𝑅𝑄|𝑅𝐹}

Then, DFA 𝑀 simulates NFA 𝑀 by keeping track of all possible states that NFA 𝑀 may reach after reading input string.

From the construction, Start at the state {𝑞0} in NFA 𝑀, which corresponds to the start state 𝑞0 in DFA 𝑀, try to attach all possible states that NFA 𝑀 may reach after reading input string, Thus a subset of states in NFA 𝑀 can be formed, which is a state in DFA 𝑀. Then start at each state in NFA 𝑀, follow the same rule, try all possible transitions for each input symbol, construct new subset of states in NFA 𝑀, which is a new state in DFA 𝑀. Then start at each new constructed subsets, search all possible transitions for each input symbol with each state in the subset.

  • If any one of the state can reach an new state, then add that new state into the new subset.
  • If any one of the state in the subset is an accept state in NFA 𝑀, then the new subset is also an accept state in DFA 𝑀.

Recursely do this until no new subsets can be formed.

P.S., with this construction, some states in DFA 𝑀 may be unreachable from the start state {𝑞0}, discard those states. With this construction, some states in DFA may not able to reach any accept states, those states can be considered as dead states. Discard those or keep those states as you like.

P.S., If any one of the state have epsilon transitions, then add those reachable states via epsilon transitions into the subset as well.

E.g., for NFA above:

Since no branch sketch to dead state {𝑞2} or {𝑞3} or {𝑞4}, those states can be discarded.

Which have a image like:

2.1.2.2.1.6.2 Recall for Closure Properties
  • Union: Construct a new NFA that connect two start states of two NFA via epsilon transitions from a new start state. And then everything done.
  • Concatenation: Construct a new NFA that connect each accept state of first NFA to the start state of second NFA via epsilon transitions. And then everything done.
  • Star: Construct a new NFA that connect each accept state of NFA back to the start state via epsilon transitions. Also, add a new start state that is also an accept state, and connect it to the old start state via epsilon transition. And then everything done.
2.1.2.2.1.6.3 Regular Expression to NFA

Theorem: If R is a regexpr and 𝐴=𝐿(𝑅) then A is regular.

Proof:

Basically, Convert R to equivalent NFA 𝑀,

  • If R is atomic:

    • 𝑅=𝑎 for a symbol 𝑎Σ:
    • 𝑅=𝜀: or
    • 𝑅=:
  • If R is composite:

    • 𝑅=𝑅1𝑅2: for and , exists
    • 𝑅=𝑅1𝑅2: for and , exists
    • 𝑅=𝑅1: for , exists

Then, by structural induction on R, we can show that NFA 𝑀 recognizes A.

2.1.2.2.1.6.4 Generalize NFA

Similar to NFA, but will more complex transitions. GNFA allow transitions labeled with regular expressions.

Assume:

  • one accept state, separate from the start state: connect all old accept states to new accept state via epsilon transitions, and treat old accept states as normal states.
  • one arrow from each state to each state, except:

    • only existing the start state
    • only entering the accept state
    • connect states without stransitions via emptyset transitions.
2.1.2.2.1.6.5 NFA to regular

Inverse, if a language L is regular, then there is a regexpr R such that 𝐿=𝐿(𝑅).

Lemma: Every GNFA G has an equivalent regular expression R.

Proof:

By induction on the number of states in GNFA G.

Basic(k = 2): G = . Let R = r

Induction step(k > 2): Assume Lemma true for k - 1 states and prove for k states.

Convert k-state GNFA G to (k - 1)-state GNFA G' by removing one state q_rip that neither start nor accept states. And repair all path may go through q_rip.

2.1.2.2.1.7 Non-regular languages
2.1.2.2.1.7.1 Pumping Lemma for regular languages

To show a language is regular, just give a finite automaton or a regular expression.

To show a language is non-regular, give a proof by contradiction with pumping lemma.

Pumping lemma for regular languages describes a property that all regular languages must satisfy. If a language fail to satisfy this property, then it is non-regular.

Pumping Lemma: For every regular language A, there is a number p (the pumping length) such that if 𝑠𝐴|𝑠|𝑝 then 𝑠=𝑥𝑦𝑧 where

  • 𝑥𝑦𝑖𝑧𝐴 for all 𝑖0,
  • 𝑦𝜀 (y is not empty),
  • |𝑥𝑦|𝑝,

Informally, any sufficiently long string in a regular language can be pumped (have a middle section repeated any number of times) and still be in the language.

Or, If there is a substring that can be repeated any number of times to produce new strings in the language, then the language may be regular.

Pumping lemma depends on the fact that if M has p states, and it runs for more than p steps will enter some state at least twice (by pigeonhole principle).

2.1.2.2.1.7.2 Using pumping lemma to show non-regularity
2.1.2.2.1.7.2.1 𝐷={0𝑛1𝑛|𝑛0}

Let 𝐷={0𝑛1𝑛|𝑛0} show: D is not regular.

Proof by contradiction: Assume D is regular. Then, by pumping lemma, there is a pumping length p. Let 𝑠=0𝑝1𝑝𝐷 thus |𝑠|=2𝑝𝑝.

And pumping lemma says that 𝑠=𝑥𝑦𝑧 where

  • 𝑥𝑦𝑖𝑧𝐷 for all 𝑖0,
  • 𝑦𝜀,
  • |𝑥𝑦|𝑝,

Assuming 𝑥,𝑦 contains all 0s, then 𝑦=0𝑘 for some 𝑘1. But 𝑥𝑦𝑦𝑧 has excess 0s than 1s, thus 𝑥𝑦𝑦𝑧𝐷, contradiction.

Therefore the assumption is false, D is not regular.

2.1.2.2.1.7.2.2 𝐹={𝑤𝑤|𝑤Σ}, Sigma = {0, 1}.

Let 𝐹={𝑤𝑤|𝑤Σ}, Sigma = {0, 1}. Show F is not regular.

Proof by contradiction: Assume F is regular. Then, by pumping lemma, there is a pumping length p. Let 𝑠=0𝑝10𝑝1𝐹

According to pumping lemma, 𝑠=𝑥𝑦𝑧 where

  • 𝑥𝑦 holds all 0s in the first half of s,

And 𝑥𝑦𝑦𝑧 has excess 0s in the first half than the second half,

Contradiction found, thus F is not regular.

2.1.2.2.1.7.2.3 𝐵={𝑤|𝑤 has equal number of 0𝑠 and 1𝑠}

Let 𝐵={𝑤|𝑤 has equal number of 0𝑠 and 1𝑠}. Show B is not regular.

Proof by contradiction: Assume B is regular. Then, by pumping lemma, there is a pumping length p.

Since we know that 01 is regular, thus 𝐶=𝐵01={0𝑛1𝑛|𝑛0} is also regular (by closure under intersection).

But for language C, we have already shown it is not regular.

Contradiction found, thus B is not regular.

2.1.2.2.1.8 Context-free languages

Context free grammar are more powerful than finite machines.

Composed of variables and rules.

  • rule: variable -> string of variables and terminals
  • variable: non-terminal symbol, appear on left side of some rule
  • terminal: symbol in the alphabet or epsilon, appear in only right side of rules
  • start variable: special variable that appear in the left side of no rule

Grammar can generate strings by starting with start variable, then repeatedly replacing some variable with the right side of one of its rules, until there is no variable left.

The terminals are the base of final strings generated by the grammar.

2.1.2.2.1.8.1 Parse Trees

Start at the root with start variable, then for each rule applied, create child nodes for each symbol in the right side of the rule.

When all leaves are terminals, the parse tree is complete.

2.1.2.2.1.8.2 Formal definition of CFG

Defn: A context-free grammar G is a 4-tuple (𝑉,Σ,𝑅,𝑆) where

  • V: finite set of variables
  • Σ: finite set of terminal symbols, disjoint from V
  • R: finite set of rules of the form 𝐴𝛾 where 𝐴𝑉 and 𝛾(𝑉Σ)
  • S: start variable, 𝑆𝑉

For 𝑢,𝑣(𝑉Σ), we say that u directly derives v,

  1. 𝑢𝑣: u yield v if it can go from u to v in one substitution step in G
  2. 𝑢𝑣: u yield v if it can go from u to v in zero or more substitution steps in G or 𝑢𝑢1𝑢2𝑣, called derivation of v from u. If 𝑢=𝑆, then it is a derivation of v from G.

𝐿(𝐺)={𝑤Σ|𝑆𝑤}, the language generated by G.

Defn: A is a context-free language if there is a CFG G such that 𝐴=𝐿(𝐺).

2.1.2.2.1.8.3 Ambiguity

For some CFG, there may be more than one parse tree for some string in the language. For some string, there may be more than one leftmost derivation or more than one rightmost derivation.

2.1.2.2.1.8.4 PDA: pushdown automata

This is a new view of finite automata with a stack memory.

For a pda, there exists a finite controller and a input tape, the head pointer can always trace input.

PDA are mostly similar to finite automata, but with a stack.

The limitation of finite automata is limited memory, but with a stack, PDA has unlimited memory, used in a restricted way. And PDA have the ability to push data into the stack, pop out of the stack and used as memory.

Only accepted at the end of input.

Defn: A Pushdown Automata is a 6-tuple: <𝑄,Σ,Γ,𝛿,𝑞0,𝐹>,

  • Σ: inpu alphabet
  • Γ: stack alphabet
  • 𝛿: transition functions: 𝑄×Σ𝜀×Γ𝜀𝑃(𝑄×Γ𝜀) 𝛿(𝑞,𝑎,𝑐)={(𝑟1,𝑑),(𝑟2,𝑒)} epsilon here represents read no symbol in input, or read nothing in stack.

E.g., 𝐵={𝑤𝑤𝑅|𝑤{1,0}}, and sample input: 011110

  • read and input input symbols, nodeerministically either repeat or goto 2
  • read input symbols and pop stack symbols, compare, if ever not equals to then thread reject.
  • and enter accepted state if stack is empty.

Assume, every time the state fork, stack is duplicated for each.

2.1.2.2.1.8.5 Convert CFG to PDA

Theorem: If A is a CFL then some PDA recognizes A. Proof: Convert A's CFG to a PDA.

IDEA: PDA begins with starting variables and guesses substitutions. It keeps intermediate generated string on stack. When done, compare with the input.

P.S., Use stack as a kind of cache for intermediate generated string.

If find a terminal on the top of stack, then pop it and compare with input symbol, until there is a variable in the stack.

  1. Push the start symbol on the stack.
  2. If the top of stack is a variable, non-deterministically choose a rule with that variable on the left side, pop the variable and push the right side of the rule onto the stack. Else if the top of stack is a terminal symbol, then pop it and compare with input symbol, if equal, then read next input symbol.
  3. If both input and stack are empty, then accept.
2.1.2.2.1.8.6 Convert PDA to CFG

Theorem: A is a CFL iff some PDA recognizes A. Proof need to be done on both PDA can be converted to CFG and CFG can be converted to PDA.

Proof:

2.1.2.2.2  From The C Programming Language To Theoretical Computer Science (Section I) [S1]
2.1.2.2.2.1 Section I: C Programming Language

To have a glance to computer science, we must have known a programming language, and then it could lead you to understand some key concept within the computer and programming language design.

2.1.2.2.2.2 Intro

C语言, 历史悠长, 自从它于80年代伴随 Unix 出现, 便成为了全世界开发者的心头好. 至今为止都依然被广泛使用. 上到各种琳琅满目的应用程序, 下到操作系统内核, 都可以由C编写, 都依赖C的代码.

举个例子: 世界上的绝大多数服务器, 都是由 Linux Linux 承载着的, 而 Linux Linux 的内核, 几乎只有 C C 所编写的代码. 当然, 在大家的手机上, 任何一部安卓手机, 它的内核, 其实也是Linux, 可以说, C 驱动着世界上绝大多数设备的运行. (之所以不用Windows举例, 一是Windows是一个闭源产品, 二是Windows内核主要由微软自己魔改的C++代码编写)

C是一门高级语言, 但是何为高级语言?

2.1.2.2.2.3 High Level Language

高级语言是相对于低级语言而言的. 一般而言, 我们所说的低级语言, 是各个不同设备上面的汇编语言, 这些语言非常强大, 可以操作 CPU, 也非常基础, 一旦没有它们, 任何后续的工作都无法进行.

但是它们的问题也非常严重. 那就是它们与平台极度绑定, 一段代码, 只能在特定平台上工作. 即便逻辑相似, 或者完全一致, 但是你还是不得不按照不同平台的规定, 为它们依次适配. 这仅仅只是开发过程, 就已经可以体会到通过低级语言开发程序的麻烦了. 而到了软件升级这一步骤, 这样的一套流程就更加恐怖, 复杂度直线上升.

而高级语言, 是一种对于低级语言共同特征的抽象, 帮助程序员写出可以在不同平台间无痛或相对轻松移植的代码.

低级语言, 就像是专门为特定的设备编写的特制工具, 只能在某台设备上面使用. 它们虽然可以直接操作硬件设备, 但是写起来非常复杂. 而高级语言, 比如C或者Python, 可以让程序员使用更加容易理解的方式写出程序. 系统可以帮你, 将你的代码, "翻译" 成为机器可以理解的指令, 这样即便不担心底层的细节, 也能让程序在不同的设备上运行.

当通过C编程语言进行工作的时候, 我们可以抽象出加减乘除等操作, 分别对应操作不同位数数据的汇编指令; 可以抽象出各种变量, 直接对应内存中的一段空间.

比如: 如果只是以两数相加举例的话, 对于C而言, 无论哪个平台的加法都可以通过 a + b a + b 来完成, 但是对于 IBM IBM 兼容机型的 x86_64 x86_64 架构 intel intel 语法宏汇编 (好长的定语) 而言, 则可能是 ADD AH, BH ADD AH, BH , ADD AX, BX ADD AX, BX , ADD EAX, EBX ADD EAX, EBX , 乃至于 ADD RAX, RBX ADD RAX, RBX 这里甚至只是考虑到只有两个通用寄存器参与运算的情况, 如果还有内存, 还要复杂的多. (其实如果用 AT&T AT&T 语法还能更复杂些, 毕竟 AT&T AT&T 还要考虑指令名的问题).

这就为程序的移植提供了极大的方便, 不再需要手动为不同的平台进行适配.

2.1.2.2.2.3.1 Mid-Level Language

C语言虽然名义上是一个高级语言, 但是很多人并不这么认为, 因为C语言并不提供一种通用的内存管理方案. 所有的内存都需要由程序员自己来手动管理. 这为系统编程提供了便利, 但也造成了不少内存泄漏等问题. 依旧需要考虑与低级语言汇编相似的边界问题.

因此, 便有人将C语言称作中级语言, 过渡语言. 不过, 这不过是称呼上的差别而已.

2.1.2.2.2.3.2 Compile & Interpret

CPU 实际上只能够理解和运行二进制的机器码. 因此, 直接以人类可读形式写出来的代码, 计算机没有办法直接执行. 这就需要对代码进行 编译 编译 , 或者 解释 解释 .

源代码 编译 汇编文件 汇编 目标二进制 链接 目标可执行
  1. 编译, 是将代码编译到汇编语言 (或其他语言), 再通过汇编器生成对应二进制代码, 最后链接, 产生原生可执行程序 (该可执行程序会最终包含操作系统需要的结构) 的一种过程.
源代码 解释器 输出
  1. 解释, 则是不经过编译过程, 通过虚拟机, 或者解释器, 随读入源文件执行代码的过程.

实际上, 对于现代语言, 编译型语言和解释型语言的区别并没有特别大. 比如, Java Java 语言就既需要编译到 JVM bytecode JVM bytecode , 也需要用 JVM JVM 解释字节码运行.

而我们, 会因为一门语言更倾向于如何运行, 来说这个语言是编译型语言, 或解释型语言. 比如, C语言, 就是一门会要求编译, 再运行的语言, 因此, 我们认为, C语言, 是一门编译型语言. 再如, 大家或许熟悉的 Python语言, 便是通过解释器执行的, 因此才认为 python语言 是一门编译型语言.

2.1.2.2.2.4 Environment And IDE

不知道大家是否喜欢玩 PC 上的游戏, 有时候玩游戏会提示缺少 DirectX DirectX 运行时环境, 编程也和玩游戏一样, 是需要环境的. 一般而言, 我们将这种专门用于开发程序的环境, 称作开发环境. 而将所有开发所需要的工具和开发环境本身, 一起打包, 并预先配置的软件系统, 就称作集成式开发环境(IDE).

在 Windows 平台上, 最常用的C语言 IDE 是 Microsoft (C) Visual Studio, 不过这个 IDE 以及它配套的编程环境, 都是为了 C++ 和 C# 而量身设计的, 并不太适用于 C 语言, 而它强制要求的工程管理, 以及提供的过多功能, 也容易导致初学者眼花缭乱, 忽视C语言学习的核心.

而 MacOS 平台上, 苹果公司提供了 Xcode IDE, 不过除了不得不写 Swift, 也几乎没有人使用它.

Linux 平台, 最常用的 "IDE" 是 (Neo)Vim 和 Emacs, 不过, 并不适合所有人使用.

鉴于平台相对不易统一, 而以上三个平台, 均提供了相对简单的方式以 LLVM-Clang LLVM-Clang 编译器作为 C语言 的编程环境, 在此处, 我们将采用手动配置环境的方式, 来作为学习C语言的第一步. 这也是大多数教程, 机构, 学校, 并不会教授, 而对于后续编程学习至关重要的一个部分.

另两个个人认为相对重要的部分是工具的使用和工具与知识的区别, 分别可以在 "计算机教育中缺失的一课 (The Missing Semester of Your Computer Science Education)" 和 "理论计算机导论 (Introduction to Theoretical Computer Science)" 中找到.

2.1.2.2.2.4.1 Environment Variables

环境变量可以被视为程序的设置, 它们告诉程序该如何工作, 比如, 配置 "PATH" 可以帮助程序找到需要的文件或者指令.

简单的理解, 对于程序而言, 这就是字典的索引, 当我试图索引一些信息的时候, 可以先去目录找到 "键", 然后根据 "键" 取得 "值".

而这些组合, 可以控制程序的行动. 目前需要了解, 并且对于今后都非常重要的一些环境变量分别是:

  • PATH PATH : PATH 变量就像是指示牌, 告诉了系统到哪些地方找到你输入的指令
  • 例如: 当你希望去通过 gcc 来编译程序的时候, 系统就会到 path 指定的文件夹中, 查找 gcc 程序. 如果没有办法找到, 就会报错.
  • 当我们在控制台(命令行) 输入一些指令, 并试图执行它们的时候, 操作系统就会通过 Path 环境变量搜索, 如果可以找到, 就执行对应找到的指令, 如果没有, 则会报错.
  • 当然, 不只是我们自己执行指令的时候需要用到Path, 很多其他的程序也会通过 PATH 来找到它需要的程序. 比如动态链接器 ( ld-linux-x86_64.so ld-linux-x86_64.so )
  • 好吧其实目前只用知道 PATH 一个就够了 (
2.1.2.2.2.4.2 Windows

对于 Windows 而言, 环境变量的修改非常便捷安全:

打开 文件资源管理器 (Explorer), 右键点选 "此电脑", 并在弹出菜单中选择 "属性" - "高级系统设置" - "高级" - "环境变量" 即可看见环境变量的配置窗口.

如果需要编辑任何之一, 只需要双击点选项目, 就可以看见对应修改界面了.

那么, 如果需要手动安装C语言的开发环境, 就需要先下载对应编译器, 然后将编译器本身所在的路径通过以上的方式加入PATH环境变量中. 不过, 相对于其他方式来说, 这种方式不仅不方便, 当需要更新开发环境的时候, 也会非常麻烦.

当然, windows也有更简单的方法去安装 C语言 的编程环境, 那就是通过 WSL.

WSL的全称是 "Windows Subsystem for Linux", 是微软创造出来, 用于提升开发者体验的一个工具. 凭借WSL, 我们可以非常容易的, 像直接使用Linux一样的安装和管理开发环境.

2.1.2.2.2.4.3 Linux, MacOS & *nix

对于类Unix及Unix系统而言, 环境变量的修改往往和用户配置文件相关联. 不过, 实际上, 要在这类系统上安装 C 的编程环境, 完全不需要对环境变量做过多修改, 而可以简单通过几行命令完成.

2.1.2.2.2.5 Hello, World

于是便到了我们的第一个程序: Hello, World!

这是一个来自于 C程序设计语言 (the C Programming Language) 中的例子, 同时, 它也陪伴了一代又一代新生的程序员. 带着我们对自己创造的新世界的欢呼.

"Hello World" 是程序设计中的经典入门例子. 它简单的向屏幕输出一句话, 帮助你了解代码的基本结构和运行流程. 学会了如何编写和运行 "Hello World", 你就可以开始学习更加复杂的程序啦.

#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}
#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}

大家可以用任何笔记本将这段代码写下, 将它保存 (不要放桌面) 为 hello.c hello.c .

然后, 我们就可以开始进行编译了:

  1. Open a terminal,
  2. Enter dir dir : cd ${pwd} cd ${pwd} , where ${pwd} ${pwd} is the directory your file placed in,
  3. check if there exists file hello.c hello.c , type cat hello.c cat hello.c and press enter enter . Just after the command has been inserted, the content of whole file will be displayed. If the content printed in screen does not match the contents showing in your text input area, then you have not save the file properly. For example, the command will response with:

    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }
    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }

    in my computer with my code shown above.

  4. 最后, 输入 clang hello.c -o hello clang hello.c -o hello , and it will give no information if there are no syntax error or other problems.

然后我们就会获得一个名为hello的文件 ( hello hello 是文件名, .exe .exe 叫拓展名). (you may find it at the file explorer). 这就是我们的目标可执行文件了!

Finally, 大家可以在终端中输入 ./hello ./hello 来执行它. 这样, 就可以看到它执行以后的结果啦:

Hello, World!
Hello, World!

这样, 你就完成了c程序的基本组成, 下面, 我们将依次简单的介绍, 它们都代表了什么含义. 这样, 你就可以自己尝试, 修改这个程序的内容, 写出独属于自己的 "Hello World".

Try to change the source code and you may let it print your name.

2.1.2.2.2.5.1 Explanation

Looks fantastic?

Here let us explain the structure of our current program.

The c program always composed in similar order. For example, we always have the three parts – header file import, entry, and expression.

我们的 "Hello, World" 程序, 包含了几个部分, 库文件的引入, 入口函数(main), 以及主要的表达式.

2.1.2.2.2.5.2 Library

C语言的内核很小, 只包括了一些非常基础的功能, 而其他的部分则都通过库来提供. 同时又因为它相对比较简陋, 所以当我们使用它的库的时候需要一个描述文件, 这个文件就可以告诉编译器, 这个库提供了哪些功能.

比如说, 这段程序, 首先是一串以 '#' 号开头的文本, 这句话表示, 我们引入了一个名叫stdio的库的定义.

'#' 号, 实际上代表了 "预处理指令" 的开始, 这里的预处理指令就是 "include". Include指令常常被用来包含一个文件, 比如说这里, 就包含了 stdio.h 这个文件.

Stdio, 是 "Standard Input / Output" 的简称, 它定义了常用的输入和输出函数, 它也将会成为后续C语言程序设计中最常用的库.

那么include指令是怎么样确定它需要包含哪些文件的呢? 实际上这取决于他需要包含的文件通过什么包裹. 比如在这里, 我们就使用尖括号 ('<' 和 '>') 包裹了 stdio.h, 它表示编译器会从系统路径中查找, 如果找到这个文件, 就将这个文件完整展开在指令处. 而如果我们通过双引号 ('“') 包裹了 stdio.h, 编译器就会先尝试从当前目录查找文件了.

大家可以尝试, 在 hello.c hello.c 同目录, 创建一个 stdio.h stdio.h 文件, 再重新编译一下这个程序, 看看是否会有区别.

如果将尖括号改成双引号呢? 比如我们下面会说到的 printf printf "函数", 就是由stdio.h文件告知编译器的.

那么什么是函数呢… 先卖个关子, 后面会对函数有详细的解释.

下面就是我们程序的主体了.

2.1.2.2.2.5.3 main
int main(void) {
  // ...
}
int main(void) {
  // ...
}

这部分, 就是我们的程序开始执行的部分. 如果没有它, 我们的程序就没有办法执行.

大家可以试一试, 如果不写这些部分, 只写下中间的 printf("Hello, World!\n"); printf("Hello, World!\n"); 会出现什么情况? 当然, 当我们按下运行按钮的时候, 它会告知, 这段程序并不 "合法". 当然, 这不是在说我们做了违法的事情, 而是这样的程序, 不合C语言的语法.

同时, 如果看到 Visual Studio Code 底部的 "PROBLES" 面板, 也可以看到, 它告知我们, 这个文件, 有许多的问题. 我们将它告知的信息称之为, 错误信息, 或报错.

我们将这个部分称作 "主函数定义". 而这个main, 就是主函数了.

它基本可以被认为是固定格式 (固定格式一共有四种, 托管环境三种, 非托管环境一种, 但是目前只需要会这一种即可).

printf("Hello, World");
printf("Hello, World");

则是我们程序唯一的主体 — 我们的程序实际上只干了这一件事 — 输出 "Hello, World".

2.1.2.2.2.5.4 Function

刚才的两个部分, 我们都提到了一个概念 – "函数", 函数是什么呢, 函数实际上是一系列代码, 一系列功能的集合, 通过定义函数, 我们可以将一些不同的操作组合在一起. 方便了程序的开发. 同样的, 也可以把这样的函数提供给自己, 或者其他人使用.

比如我们用到的 printf printf 函数, 也比如我们定义的main函数.

和数学里的函数类似, 函数可以接受一些参数, 并且产生一些输出. 就像多元微积分里的向量函数,

𝑓(𝑥,𝑦,𝑧):3

就可以接受x,y,z这样的参数, 并且将它们经过一系列的变换, 让它们变成一个普通的一维值.

这里的 printf printf 和它之后的圆括号的组合, 我们将其称作函数调用. 其实也和数学中的函数, 含义一致.

Printf(...) Printf(...) 的作用是, 将文本按照一定格式打印到屏幕上, "Print (with) format", 就是这个意思啦.

而这里的 "Hello, World" "Hello, World" 就是函数调用的参数, 它告诉 printf printf 函数, 要将什么东西给输出到屏幕.

不过这里只是简单介绍它的作用哦, 实际上 printf printf 函数的作用远不止这样简单的! 我们后续会有章节单独介绍它的功能.

return 0;
return 0;

这一句, 用于终止这个函数: "main". 当编译器看见这一句话, 就知道要结束这个函数的执行了… "返回".

这其实也涉及到了一些后面的知识, 所以目前记住主函数的结束, 必须写上这样一句 return 0; return 0; 就可以了.

2.1.2.2.2.5.5 Expression: Statement.

大家如果仔细观察了, 就会发现, main函数内部的两个东西, 结尾都是分号.

其实, 分号 (';'), 表示一个语句的结尾. What is statement, statements are base unit of c programming language. Every c program are make up with statements For example, our simplest program is:

int main(){}
int main(){}

here, it contains just a function definition statement. But after all, every c program must have at least one statement.

Statements are colourful, but, the rule for them are relative same. 除了一些特殊情况, C语言中写下的所有代码, 结尾都是有分号的.

语句大致可以被分为五种:

  1. 表达式语句
  2. 函数调用
  3. 流程控制语句
  4. 复合表达式
  5. 空语句

将会在后面详细讲解各个语句, 不过, 一定要记住, 每个语句的结尾都需要一个分号;

2.1.2.2.2.6 Types

C 语言是一门静态类型语言. 那么, 这一句话就涉及到两个新知识点了!

  • 什么是类型,
  • 什么是静态类型?

作为一门计算机语言, C语言操作的实际上都是一些数值. 对于不同的数值, 我们会人为规定它是什么 "类型".

比如, 我们就将大小在 2147483648(231)2147483647(2311) 之间的整数视为 "整型数 (Integer)". 而同时, 我们也需要表示一些文本, 所以就有了所谓的 "字符(Character)" 类型和 "字符串([Character] String)" 类型.

不过为什么需要将不同类型区别开来呢? 很明显, 字符串是没有办法当作整数来处理的对吧! (除非你把它们当作范畴论范围上面的幺半群来看… 当然这样也只能统一操作而没有办法让字符串和数字相加哦~)

那么静态类型是什么呢?

就像数学并不完全是数字的操作, 大部分时候也和未知数相关一样, 计算机程序也有自己的 "未知数" 需要操作. 当我们需要计算一些东西的时候, 很多时候都需要一个叫做 "变量" 的东西存储中间结果. 这个 "变量" 既然需要存储数据, 那么它就也需要一个类型. 毕竟, 不同类型的数据, 就上上面刚刚说明的, 有着不同的属性, 完全没有办法用同样的方式存储.

而 C语言 更进一步, 为了避免变量在多次赋值以后, 类型会不清, 干脆让我们在定义变量的时候就固定它可以承载的数据类型了. (实际原因当然不是这样啦, 实际上 C语言 必须有类型的信息, 才能为变量分配空间, 而不同的类型一般而言需要的空间不同, 自然不可以混用, 后续将在 "内存模型" 部分详细解说喵~ >w<) 这就是我们说的 "静态类型" 系统.

2.1.2.2.2.6.1 Literal

字面量, 就像我们在解数学题目的时候, 会写下一些系数, 一些常量, 字面量就是直接出现在程序当中的常量.

不过和常量有一些区别的是, 字面量是真正没有办法被改变的. 而计算机程序中的常量, 则仅仅只是表示一个变量不会被改变而已… 通过一些特殊的手段, 我们也是可以让一个常量打开心扉, 接受新的数值的.

2.1.2.2.2.6.2 Basic Data Types

对于简单的编程任务, C语言定义了一些基本数据类型. 它们涵盖了数字, 文本和逻辑(好吧其实并没有).

2.1.2.2.2.6.2.1 Integer

我们最常用, 并且也将最先介绍的就是整数家族了:

  • short short : 短整型, 相对于整型, 需要的内存更少, 只有16位空间 但是相应的,可以表示的数值也越少.
  • int int : 整型, C语言中默认的数据类型, 一般为32位空间, 也就是可以有31位二进制可以用于表示数据, 上述的 21474836482147483647 便是它可以表示数据的范围
  • long long : 长整型, 相对于 int int , 可能更长, 一般在处理大数据的时候才会用到
  • long long long long : 真长整型, 确定的64位数据.

每当我们在代码里面写下一个整数, 它就会自然具有上述类型之一的信息. 比如:

short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;
short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;

注: 以上代码均写于 主函数 当中!

这里, 0, 65536, 2147483647 就都是 "int" 类型的 "字面量", 而 2147483648 就是一个 "long long" 类型的字面量了.

不过这些数字前面的类型和等于号都有些什么作用呢… 大家马上也会明白! 不过我们先来了解一下整数的变体们:

  • signed signed : 有符号前缀, 表示该类型是一个有符号的数据, 一般而言, 整型都是有符号的
  • unsigned unsigned : 有了上一条的提示, 当我们不需要表示数据的负数部分时, 当然就可以用无符号类型了, 当我们用无符号来修饰一个变量的时候, 它的表示范围就会从一半正一半负, 变成完全的正数哦, 相当于给 加上了一个的上标, 变成了, 不仅如此, 它正数部分的表示范围也会翻倍
  • 不过虽然被称作前缀, 它们其实也是可以 "单干" 的, 当只有前缀出现时, 实际上 C语言 (标准) 会自动给他补上一个 int 的.

这里可以再来几个例子:

signed int i = 2147483647;
unsigned int u = 2147482647u;
signed int i = 2147483647;
unsigned int u = 2147482647u;

Integer may be expressed as:

<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
2.1.2.2.2.6.2.2 Literal Suffix

有些同学可能就注意到了, 我们有些的数字之后, 跟上了一些字符. 这些字符, 比如 ll ll , ull ull , 被称作字面量后缀, 它的作用是, 给字面量一些修饰, 以方便编译器正确的处理这些数值.

那么, 大家注意到:

long long ll = 2147483648ll;
long long ll = 2147483648ll;

这一行, 大家可以尝试将这一段文本的字面量后缀 ll ll 去掉, 看一下, 会发生什么? 当我们尝试运行程序的时候, 程序报错了.

这是因为, 在C语言中, 我们写下的所有整数, 默认的类型都是int类型, 如果字面量超出了int类型的范围, 那就会出现错误.

2.1.2.2.2.6.2.3 Real numbers: float float & double double

在整数之外, 我们自然还有小数. 在 C语言 中, 我们将小数称之为 "二进制浮点数" 简称 "浮点数".

C语言中的常用浮点数一共有三种, 分别是:

  • float float : 默认浮点数, 一共占用32位字长, 不过相对于整数, 浮点数并没有精确的表示范围
  • double double : 双精度浮点数, 相对于 float float , 它的表示精度更高
  • long double long double : 双精度的升级版

不过为什么浮点数要叫做浮点数呢? 当然是因为它的小数点不是固定的啦.

不过, 也许还有人会疑惑, 什么叫做固定的小数点? 一般而言, 小数的位数不是无限的吗? 这当然还是因为计算机表示的局限性.

比如, 当我们需要表示金额的时候, 一般都可以写作 "XX元Y角Z分" 对不对, 那么当我们想要统一在 "元" 表示的时候, 就可以写作 "XX.YZ元" 了. 那么这里, 我们相当于是将所有单位统一到 "元", 而给 "角" 和 "分" 固定在了小数点后两位. 这就是所谓的 "定点数". 或者说, "100倍放缩的定点数".

那么, 有了 "定点数" 的前置理解, "浮点数" 或者 "动点数" (这是我瞎起的) 就好理解了. 因为定点数太过于固定, 只能适用于某些特殊场景. 所以就可以想到, 如果我们用一些方式, 记录住小数点的位置, 不就可以来表示任意形式的小数了吗. 于是, 浮点数就诞生了. 不过, 上面我们表示的 "定点数", 是以 10 为基底的十进制定点数, 而在计算机里, 我们使用二进制数来表示数据, 因此, 我们实际上使用的浮点数也是二进制表示的. 这就可以解释什么叫做 "二进制浮点数" 了.

2.1.2.2.2.6.2.4 Type Boost

当然, 在数学之中, 我们也有整数和小数的运算, 大家可以先试一下, 当我们在c语言之中, 进行了可以得到小数的运算之后, 会得到怎么样的结果?

printf("%d", 1 / 2);
printf("%d", 1 / 2);

结果是0, 是不是很奇怪?

因为, 在c语言中, 整数和整数之间的运算, 只会得到整数, 如果需要一个浮点数结果, 就必须让一个浮点数参与运算, 比如

printf("%f", 1 / 2.0);
printf("%f", 1 / 2.0);

这样, 就得到了0.5.

为什么会这样呢? 因为在 C语言中, 当一个运算涉及的类型不相同的时候, 会将表达范围较小的数据, 转换成为表达范围更大的一个数据, 再去参与运算. 我们将这种过程称作, 自动类型转换.

当这里的int类型的整数, 遇见了2.0这样一个float类型的浮点数, 实际上浮点数的表示范围大于整数, 所以, int就被提升到了float类型, 并且参与运算, 得到 1.0 / 2.0 = 0.5 了.

以下是自动类型转换的图表

small -------------------------------------------------------> -------------------------------------------------------> large
char, short, int unsigned int long long long float double long double

从左到右, 类型依次自动提升.

而从整数开始的类型转换, 被称作 "整型提升". 比如可以看到, char, short, int类型, 均为同样的自动类型转换阶段. 因为对于char, short, 和int类型, 都发生了相同了整型提升, 按照C语言的规则, 会将所有的表示范围小于int的类型, 均提升到int类型的大小来参与运算.

无论使用什么整数, 都可以在表达式中使用char, short int或 int字段(全部带符号或没有符号)或枚举类型的对象. 如果一个int可以代表原始类型的所有值, 则该值将转换为int; 否则, 该值将转换为unsigned int, 这个过程称为整体提升.

这从汇编的角度来看, 其实就是将寄存器由小寄存器, 拼接到相对大的寄存器. 如, 将 AH AH 寄存器, 提升到 EAX EAX 寄存器.

2.1.2.2.2.6.2.5 String & Char

另一部分, 在数值之外, 就是字符类型和字符串了.

我们在数学的学习中, 计算出的结果, 直接写在 "解" 字后面就可以, 这实际是一种得出结果的 "输出" 过程. 那么, 同为进行数学计算的计算机, 要如何组织它的输出呢? 当然就是靠字符串咯:

printf("This Is A String");
printf("This Is A String");

依旧是熟悉的 printf printf , 不同的是它需要操作的字符串.

字符串, 顾名思义, 是一串连续的字符序列, 一般我们用双引号括住的一串连续文本来表示一个字符串字面量.

那么字符该怎么样表示呢?

很简单, 除了双引号, 我们还有单引号呀. 理想情况下, 所有的单引号包括的单个字符都是一个字符. 不过, 因为有些字符完全没有办法用键盘打出来, 所以我们也提供了另外一些方式:

  • 'c' 'c' : 单引号包括字符
  • '\ooo' '\ooo' : 按8进制表示的字符
  • '\xhhh' '\xhhh' : 按16进制表示的字符

当然咯, 有些字符远超过了字符可以表示的长度(8位), 所以我们还有另一种字符类型: "长字符" 类型.

  • L'c' L'c' : 单引号包括的长字符
  • L'\ooo' L'\ooo' : 单引号包括的8进制表示长字符
  • L'\xhhhh' L'\xhhhh' : 单引号包括的16进制长字符

大家其实也可以看出来, 长字符字面量实际上就是给普通的字符字面量添加了一个"L"前缀罢了. 那么实际上, 我们也可以用同样的方式, 把一个普通的字符串字面量变成长字符串:

wprintf(L"Hello World");
wprintf(L"Hello World");

注: 实际上中文字符都会超过字符类型可以表示的范围, 但是为什么普通字符串可以表示含有中文的文本呢? 比如, printf("你好, 世界"); printf("你好, 世界"); . 因为字符串实际上不一定是一个字符变量表示一个字符, 现在看来可能会有些绕口, 但是当我们讲到字符串实际的表示方式的时候, 就会很好理解了.

所以也不是特别需要用长字符串来表示文本了.

对了, 不知道大家有没有注意到, 当我们描述整数类型的时候, 并没有说到8位整数, 对应着其他语言中很常见的 byte byte 类型? 这是因为, c语言用 char char 类型代替了8位整数, 所幸, c语言中并不是很常用到8位的数值, 因此这样的代替也并不是很大的问题. 当我们真的需要它的时候, 也可以临时用 char char 类型充当一下.

2.1.2.2.2.6.3 Logical Values

当然, 计算机也不总是只处理数值. 作为一堆二三极管, 逻辑门, 晶体管拼接而成的产物, 有有着天生的二进制表示, 二进制逻辑也是计算机程序处理的内容之一.

先从简单的入手, 逻辑一共有两种状态, 是, 或者否, 在 C语言 中, 我们用了一种很简单的方式来表示:

  • 数值为0: 否 ( false false ),
  • 否则: 是 ( true true ).

很简单对不对.

2.1.2.2.2.6.4 Void Type

以上的类型, 都还很具体, 不过当我们需要表示 "这里没有东西" 呢? 该怎么办?

这时候我们就需要用到 void void 类型了. 不过这里不解释太多, 我们将会在应用中见证它的使用.

2.1.2.2.2.7 Mathematics Operations

有了数字, 并不能让我们进行计算, 我们还需要定义对于这些数字的运算才可以.

所以首先, 对于所有的数值, 不管是整型数家族的, 还是浮点数家族的, 都适用于我们熟悉的四则运算, + + , - - , * * , '/'.

Operations Description Form Comment
+ + 两数相加, 并返回新的相加后的值 A + B A + B
- - 从前数中减去后数, 并返回新的相减后的值 A - B A - B
* * 两数相乘, 并返回新的乘积 A * B A * B
/ / 前数除以后数, 并返回除商 A / B A / B

当然了, 由于取余数的操作太有用了, 实际上 C语言 也为整数和浮点数的取余操作定义了两个方式, 并将这种运算称作 "取模":

Operations Description Form Comment
% % 取模 A % B A % B
fmod fmod 浮点数取模 fmod(A, B) fmod(A, B) 该方法为函数调用, 仅对 double double 类型浮点数生效
fmodf fmodf 浮点数取模 fmodf(A, B) fmodf(A, B) 该方法为函数调用, 对 float float 类型浮点数生效
fmodl fmodl 浮点数取模 fmodl(A, B) fmodl(A, B) 该方法为函数调用, 对 long double long double 类型浮点数生效

下面则是c语言中, 整型变量特有的四种运算符, 它们被称作 "自增/自减运算符"

Operations Description Form Comment
++ ++ 自增 A++ A++ 先将原始值返回, 再将变量值增加1
++ ++ 自增 ++A ++A 先将变量值增加1, 再返回增加后的值
-- -- 自减 A-- A-- 先将原始值返回, 再将变量的值减少1
-- -- 自减 --A --A 先将变量的值减少1, 再返回减少后的值

大家可以发现, 自增和自减运算符都是有一定的规律的, 如果运算符的位置在变量的前面, 那么就是先对变量进行操作, 然后再取值, 而如果运算符的位置在变量的后面, 则先取值, 等到值参与完运算以后再给变量自增或自减.

int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);
int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);

同样的, 大家也可以看到, 这里对于运算符的描述并不是对数值生效了, 而是对 "变量" 生效. 那么变量是什么东西呢? 正如之前已经提到过的, 变量是一种用来存储数值的东西, 那么既然变量可以存储数值, 并且也可以参与运算, 所以我们就也自然会有一些对于变量本身存储的数值进行操作的运算符, 除了这里讲到的自增自减运算符, 其实还有其他的, 比如赋值运算符.

2.1.2.2.2.7.1 Relation Operations

除了数值运算, 实际上我们也可以对这些数值进行比较, 在 C语言中, 这些用来比较不同数值之间大小关系的运算符, 被称作 "关系运算符".

关系运算符对于所有的数值都生效, 而对于字符串, 由于字符串的比较也非常常用, 因此, 字符串比较的函数也是被纳入到了标准函数库中. 不知道大家是否还记得前面提到的, 什么是 "库". 库, 就是一种由其他人写出来, 而不是由C语言本身提供, 定义了一系列有用的函数以供导入的东西.

好吧, 扯远了, 一下就是所有常用的关系运算符 (和函数):

Operations Description Form Comment
== == 相等关系 A==B A==B 若A等于B, 则返回1
!= != 不等关系 A!=B A!=B 若A不等于B, 则返回1
> > 大于关系 A>B A>B 若A大于B, 则返回1
< < 小于关系 A<B A<B 若A小于B, 则返回1
>= >= 大于等于 A>=B A>=B 若A大于等于B, 则返回1
<= <= 小于等于 A<=B A<=B 若A小于等于B, 则返回1
strcmp strcmp 字符串比较 strcmp(A, B) strcmp(A, B) 若两字符串相等, 返回0, 否则返回按字典序相减值
memcmp memcmp 内存比较 memcmp(A, B) memcmp(A, B) 返回两内存空间相减二进制值

不过, 必须要注意的一点是, C语言中不存在连续不等式, 也就是说, C语言中是没有办法写出类似 𝐴>𝐵>𝐶 的这种表达式的.

那么, 如果真的不小心写出了这样的代码, 会发生什么事情呢? 比如说 1 < a < 10 1 < a < 10 .

实际上, 这种表达式会被C语言认为是一种连续运算的表达式. 也就是, 前面一个表达式运算完成, 然后再让结果参与下一个表达式的运算, 而这种连续运算, 是存在优先级关系的, 就像数学中, 同时包含加减和乘除的算式中, 永远都是乘除先参与运算一样.

那么, 对于上面的表达式, 就是先进行 1 < a 1 < a 的运算, 再把结果, 不论是1, 或是0, 交给后面与10的比较. 这样就会导致, 这个表达式的结果, 一定只是1.

因此, 一定要注意, 不要写出 "连续不等式" 哦.

2.1.2.2.2.7.2 Logical Operations

逻辑运算, 也是C语言经常需要进行的运算, 那么什么是逻辑运算呢?

实际上, 逻辑运算就是能够把多个逻辑值串成一串, 确定最后到底结果是真是假的运算.

就比如, 刚刚才提到的, C语言中并没有连续不等式, 那么该怎么样表示连续不等关系呢? 这里就需要用到逻辑运算了.

逻辑运算主要包含了, 或, 与, 非, 三种运算:

Operations Description Form Comment
&& && 逻辑与 A&&B A&&B 若A和B都非0, 则返回1
|| || 逻辑或 A||B A||B 若A和B有至少一个非0, 则返回1
! ! 逻辑非 !A !A 若为0, 则返回1; 若非0, 则返回0

从这里, 也可以看出来, 逻辑与或非和逻辑门运算还是非常不同的. 所以后面, 将会单独对按位逻辑运算进行详细介绍…

回到如何表示连续不等关系, 只要这样写即可

1 < a && a < 10
1 < a && a < 10

值得注意的是, 逻辑运算符, 都是 "短路" 的. 这是什么意思呢? 就是说, 如果逻辑运算符的左边结果, 已经可以决定逻辑运算符整体结果, 那么逻辑运算的右半部分就不会被执行, 而是直接将逻辑运算的结果返回出来.

2.1.2.2.2.7.3 Associativity

正如上面提到的, 运算符结合性决定了连续运算的表达式的执行顺序, 那么, 具体的规则如何呢?

在下表中, 自上而下, 与对应操作相关的表达式被更先进行, 由左而右, 结合性依次减小

Operations Description Comment
() [] -> . ++ -- () [] -> . ++ -- 后缀 从左到右
+ - ! ~ ++ - - (type)* & sizeof + - ! ~ ++ - - (type)* & sizeof 一元 从右到左
~ ~ 按位取反 从左到右
* / % * / % 乘除 从左到右
+ - + - 加减 从左到右
<< >> << >> 移位 从左到右
< > <= >= < > <= >= 比较关系 从左到右
== != == != 相等关系 从左到右
& & 按位与 从左到右
^ ^ 按位异或 从左到右
| | 按位或 从左到右
&& && 逻辑与 从左到右
|| || 逻辑或 从左到右
? : ? : 三目运算 从右到左
= += -= *= /= %= >>= <<= &= ^= |= = += -= *= /= %= >>= <<= &= ^= |= 赋值 从右到左
, , 逗号 从左到右

很复杂对不对, 但是没有关系, 其实, 当你不确定运算符优先级究竟是如何的, 可以直接将自己希望的运算顺序用括号括出来, 表示它们需要优先进行. 其他的部分, 也是非常符合数学中的直观感受的.

大家也许会发现, 除了我们已经讲过的一些基本数值运算, 这张表中还有一些从未见过的其他运算符,

仔细观察的话, 除了逻辑与和逻辑或, 在这张表中还有按位与或, 异或, 和取反. 很快, 我们将开始了解它们.

PS. 另一个比较重要的则是赋值运算符家族, 将在重新完整介绍完C语言的语法后介绍.

2.1.2.2.2.7.4 Binary Calculation

现在, 就需要一些简单的数学了: 二进制运算.

首先, 什么是二进制运算呢, 实际上, 二进制运算是针对二进制数的运算, 虽然这话听起来好像是废话, 但是它实际上 也是废话 却有很多含义.

首先, 它表示了它操作的对象是二进制数, 也就是运算规则为逢二进一的数.

二进制的基数为2, 每一位的数字, 只可能是0或1.

二进制数有一些特别的特性, 其中最显著的优势在于, 它的每一位只有两种状态, 这正好和电路的开关相一致. 这样就方便了计算机的工作. 另外一些特性是, 二进制数可以方便的和十六进制与八进制相互转换, 虽然这些实际上是十六进制和八进制的优势, 因为它们基数均为二的次方.

2.1.2.2.2.7.5 Radix Convert

二进制对于计算机友好, 但是对于人类来说却有些难办了. 因为我们常年都在和十进制打交道.

那么这就需要处理各种 "进制转换" 问题.

二进制和十进制, 同样都表示了同样的数集中的数, 因此它们可以以一定规则互相转换.

二进制转换为十进制, 实际上就是依照每一位, 乘以对应的二的次方. 也许听起来会有些复杂, 但是操作起来非常简单: 如: 我们有二进制数 1011, 那么它的十进制就是:

(1011)(2)=1×23+0×22+1×21+1×20=(11)(10)

二进制转换为十进制也是类似的, 就是不断将十进制数除二取余数即可:

112=5152=2122=1012=01

最后将余数从下向上写出即可得到对应二进制数.

上文提到, 二进制和十六进制, 八进制的互相转换非常方便, 那么, 它具体方便到什么程度呢? 对于二进制转十六进制, 只要按四位一组, 高位不足补0, 直接换成十六进制就行. 八进制也类似, 按三位一组, 高位不足补0, 替换成为八进制.

继续以 1011 举例:

(1011)(2)=(𝐵)(16),(1011)(2)=(001011)(2)=(13)(8).

反向操作也极其一致, 非常方便.

2.1.2.2.2.7.6 Bitwise Operations

二进制, 除了常规的十进制运算, 其实也提供了一些特别的运算能力, 在C语言中的表现就是, 按位运算.

在计算机中, 门电路一种可以提供 与门(AND), 或门(OR), 非门(NOT), 与非门(NAND), 或非门(NOR), 异或门(XOR), 同或门(XNOR), 这几种逻辑门.

它们的运算逻辑可以以下表表示:

Operations Description Form A B Result
AND AND A AND B A AND B 1010 1100 1000
OR OR A OR B A OR B 1010 1100 1110
XOR XOR 异或 A XOR B A XOR B 1010 1100 0110
NAND NAND 与非 A NAND B A NAND B 1010 1100 0111
NOR NOR 或非 A NOR B A NOR B 1010 1100 0001
XNOR XNOR 同或 A XNOR B A XNOR B 1010 1100 1001
NOT NOT NOT A NOT A 1010 - 0101

实际上, 它们的规则也非常简单:

  • 与门当且仅当两个输入均为1时才输出1, 否则输出0;
  • 或门只要有一个输入为1就输出1, 否则输出0;
  • 非门将输入取反, 原输入为1, 输出0, 否则输出1;
  • 与非门实际上是与门取反, 只在输入不存在, 或有一个1的时候才输出1, 否则0;
  • 或非门则是或门取反, 当均为0时才输出1, 否则输出0;
  • 异或门的重点在于 "异", 当两个输入相反时, 输出1, 否则输出0;
  • 同或则是异或取反, 当输入均相同时, 输出1, 否则输出0.

因此, 实际上, 一切包含非的门电路, 均可以来自于与, 或, 取反, 而其他所有门电路, 则均可以通过NAND门取得.

计算机底层的实现中, 有逻辑门运算, 而C语言中, 也有对应的按位运算. 按位运算是门运算对于多位二进制数的运算, 一共有四种:

Operations Description Form Comment
& & 按位与 A&B A&B 若A和B对应位都非0, 则对应位置1
| | 按位或 A|B A|B 若A和B对应位有至少一个非0, 则对应位置1
^ ^ 按位异或 A^B A^B 若A和B对应位有且仅有一个非0, 则对应位置1; 否则, 则对应位置0; 不同为1, 相同为0
~ ~ 按位取反 ~A ~A 每一位若为0, 则置1; 若非0, 则置0
2.1.2.2.2.7.7 Overflow

计算机操作的虽然是二进制数, 但是它的容量却是有限的, 而不能像数学中可以表示理想的无限大整数.

因此, 当数的大小超出了计算机可以表示的范围, 就发生了 "溢出". 在大多数的计算机中, 当发生了溢出, 溢出位会被抛弃, 而只给出一个是否曾发生了溢出的标记.

绝大多数时候, 我们会选择尽可能的避免溢出的发生, 因为它会导致运算结果不符合预期. 因此, 当定义变量的时候, 需要提前估算数据的范围, 为不同的数据选用不同的类型.

但是溢出并不总是坏事, 有时候, 它可以给我们带来一些特殊的优势. 比如著名的 "雷神之锤 III" 平方根倒数速算法, 就为是利用了溢出和微积分线性拟合的典例.

而我们计算机中, 对于负数的表示, 也和溢出有千丝万缕的联系.

2.1.2.2.2.7.8 2's Completion

计算机可以表示的数据是有限的, 最开始, 一块 CPU 只能计算8位二进制数, 那非常小, 只能表示 0255 之间的数据. 后来, 直到现在, 计算机也只能表示64位的数据. 当我们只考虑正数的时候, 它并不会出现很大的问题, 在整数范围内, 直接相加即可得到所需的结果. 即便是两数相加发生溢出了, 也可以相对简单的解决.

但是, 当需要考虑负数的时候, 情况就开始不一样起来了. 我们开始必须找到一种方式, 来区分一个数是正数还是负数.

最朴素的想法是, 我们舍弃一位的表示范围, 将这一位用于区分数的正负性. 于是, 我们就有了 "整数的原码表示" (Origin).

在我们需要表示的数值为正时, 原码与真值 (True Value) 相同. 而当需要表示负数的时候, 最高位会被写作1. 也就是说, 将最高位作为符号位, 记录数据是正还是负.

原码表示在数学运算中会导致非常大的问题, 因为, 负数参与运算时, 最高位为1, 与正数进行二进制加法, 可能会得到不正确的结果 — 一个更大的负数.

    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)
    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)

所以, 对于一个涉及到负数的运算, 不能直接采用通常的二进制原码表示, 简单的将负数的最高位置为1.

理想的负数表示, 需要保证运算完成后, 可以使得负数与对应正数相加值位0 (最高位产生1位溢出).

于是, 为了达成这样的结果, 我们选择将数值部分原样取反 这样就得到了 "反码" (1's Completion).

但是反码有同样的问题, 虽然可以避免正负数相加得到更大的负数, 但是一个正数, 和对应的负数相加, 得到的却不是原始的0, 而是全1, 这就会造成 +00 的问题.

    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)
    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)

于是, 既然相等负数相加不为0, 那么干脆给它补一个1, 将反码运算中的结果加上一个1, 再经过溢出处理, 最后的结果就是我们想要的真正的0.

为了实用, 将这个1, 加入到反码表示中. 于是, 我们就得到了 "补码" (2's Completion).

当然, 这是实践可以得出的结论, 补码实际上有它更深层次的意义.

2.1.2.2.2.7.9 N's Completion

N的补码, 实际上是模N剩余类加群, 对于

𝑍𝑛=𝑍mod𝑛(𝑍,mod)

, 满足封闭性, 结合性, 则有Z上的模N剩余群.

给定一个n, 有n个模n剩余类, 且有 a, b 满足 gcd(𝑛,𝑎)=1,𝑎×𝑟𝑖+𝑏, 构成模n完全剩余系.

对于𝑛𝑛, 有𝑏=𝑛𝑎𝑎+𝑏=0, 若定义 𝑎𝑛1, 存在负数与对应正数模n同余, 则n为互补常量.

𝑎=𝑎的加法逆元, 则, 对 𝑀 求补有 𝑎=𝑀𝑎,𝑀=10𝑛, 对于 M M 0=𝑀,0=0, 在 𝑀2 上同余.

2.1.2.2.2.7.10 Bitwise Shift

Apart from regular bitwise operations, we have some special ones as well. Could you image that every digit of a numbers can be shift?

We have mentioned float point numbers before already, right? You may think that float point can be seen as shift of digits. But actually, the float point numbers just move the position of decimal point.

In bitwise shift operations, the decimal point will be fixed in #0. #0. . And, move all digits directly right or left.

  • Logical Shift Right: Shift all digits right based on 0 position. Every number outside 0 will be discarded. Padding higher position with 0.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
  • Mathematical Shift Right: Mostly same as logical shift right operation, but padding higher position based on sign bit.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...

    For positive numbers, exactly like logical ones.

     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...
     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...

    For negative ones, padding number will be 1 instead.

  • Shift Left: Shift all digits left based on highest position. Every number over highest limit will be discarded. Padding 0 position with 0.

       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
Operations Description Form Comment
<< << SHL A << B A << B
>> >> SHR A >> B A >> B Different machine may choose different SHR method, Logical or mathematical

Give a brief knowledge of bitwise shift operations here. You may find that, shift operations just do multiplication and division indeed.

How?

Actually, SHL SHL are some number multiple 2𝑛. SHR SHR are some number division 2𝑛.

And all discarded numbers are seen as overflow.

2.1.2.2.2.8 Syntax

C语言, 实际上, 作为一种和计算机进行沟通交流的语言, 实际上也有自己的一套语法规范.

在前面几节中, 我们也看到了, 如果没有按照它的语法规范来书写, 就会遇见 "非法" 报错.

因此, 我们有必要系统了解一下C语言的各种语法规范.

以下是我们的示例程序:

/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}
/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}

From the program above, we can see that there are several lines that contains something we haven't met before.

We all explain them all in this chapter.

2.1.2.2.2.8.1 Statements

The first thing I'd like to tell you is definition for statement.

The c program are composed with statements, just as what we have mentioned before.

Statements define the operation the program will execute. Each statement may have do something.

According to the C Programming Language Standard, every statement in c need to end with semi-colon (';'). Unless it is listed detailed that has no necessary to have semi-colon.

For example, we can see,

  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;
  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;

they all statements.

Also, multiple statements can be written in same line. You may see this:

int i; i = 1;
int i; i = 1;

From here, we written two statements, int i; int i; , and i = 1; i = 1;

So, it is not necessary to add line feed between two different statements.

They are added for beauty and clear.

Also, because the statement termination will just be determined by semi-colon, one statement may be written in multiple lines.

int
i
=
10
;
int
i
=
10
;

They are legal as well.

But, we'll not write code in this way. More common usage of this feature will be:

int i = 10,
    j = 20;
int i = 10,
    j = 20;
2.1.2.2.2.8.2 Expression

As we have known statement, another import part of c program is expression.

From which, a expression is some form that contains different operation.

Most basic expression we'd used in program are calculation.

1 + 2
i = 0
printf("Hello, World")
1 + 2
i = 0
printf("Hello, World")

They all expressions, and finally get the result of those operation.

Statements may contains expression, but expression cannot construct a statement.

Also, most of the time, a expression will generate some value, that can be used in the following program.

Furthermore, expression is able to be nested.

printf("%d", 1+1)
printf("%d", 1+1)

Here, we have two expression, the smaller one 1+1 1+1 , and the larger one, which wraps the small one, printf("%d", ~) printf("%d", ~) .

Once we add semi-colon after them, the whole expression will be a statement.

printf("%d", 1+1);
printf("%d", 1+1);

And is ready to do something particular.

You may image, as the function call is a valid expression, and can be turned into statement. The calculations, we can also add semi-colon after them, to have a statement.

1;
8*2;
1;
8*2;

But they are meaningless.

2.1.2.2.2.8.3 Code Block

When we programming, sometimes we may want to execute some operation at same time (or intend to execute them at same time).

Then, we need Code Blocks, or "compounded statements". They are Statements composed and wrapped in one large brackets. For example:

{
  int x;
  x = 1;
}
{
  int x;
  x = 1;
}

They are seen as a group, one large statement later on the rest of program.

And we need no semi-colon at the end of bracket expression.

2.1.2.2.2.8.4 Empty Lines & Space

Not only for beauty, we'll need spaces in code for distinct different syntax object.

For example, why we always need a space between int int and i i ? Because if we dropped it, the compiler will only see inti inti , which is not a valid name, or anything else.

Just like the reason why we must write space between different words. (Even in Chinese).

So, at some particular times, if we can say that, the space will not change the structure of our code, the space is able to be deleted.

Empty lines, the line which contains no code, does relative same as space. If it is not necessarily placed there, then it does only for beauty, and can be removed.

The example here points out, when can we discard the space and empty lines.

int x = 1;
// Equals to
int x=1;
int x = 1;
// Equals to
int x=1;
2.1.2.2.2.8.5 Comment

Comments are another thing that will not affect anything within our code. When compiler meets a comment, it will ignore it directly. Which means, comment will behaviour like a space in our code.

There are two ways for us to write comments.

  • /* ... */ /* ... */ : multiple line comment, but also for inline comment, anything inside /* /* and */ */ will be ignored.
  • // ... // ... : one-line comment, anything follow after will be ignored.

We can see the code above, to have a relative simple understand to comments.

2.1.2.2.2.9 Variables & Variable space

Here, we comes to the most import part of a program. We'll know what variable is, how it is defined, and operations done on them.

First of all, we'd like to see, relation between variable and value.

2.1.2.2.2.9.1 Data, Variable, Value

Data, something that represents something, carrying some information, always the object we will manipulate in program.

But how can we describe a data? We may use something called "variable", they are some slot that has desired space for storing data.

Thus, in general, variable are some space, slot, that can store some value, carrying some specified data.

2.1.2.2.2.9.2 Definition

Before we use some concrete variable in our program. We must define them.

The basic forms of variable definition are list below:

<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];
<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];

Also, we have another way to declare a variable:

extern <variable-type> <variable-name>;
extern <variable-type> <variable-name>;

From them all, we can see that, to declare a variable. We'd have to write in "type name;" form.

Where, type can be any type specifier mentioned above in types section.

Such that,

int a;
int b;
int a;
int b;

Furthermore, when we have learnt the structure, enumerator, union and function, we all have more form of types.

2.1.2.2.2.9.3 Variable Name

One must-have element of variable definition is type. And another one is variable name.

Once we have define a variable, we can then reference it using its name.

Just like you call one's name.

Variable names in c programming language must follow some rules:

  1. start with '$', '_' and alphabet,
  2. have no space inside,
  3. followed by '$', '_', alphabet, and numbers.
  4. has a total length less than 63 character.
  5. not duplicate with any other names defined before or same with keywords like 'int'.

Keywords, are some commands will reserve for special usage in c program, for example, int int , if if , continue continue . And C programming language also have some name reserved for further usage. So, for those name, although it is possible to be use, it is not encouraged to do so.

Here are some mainly used keywords and reserved names:

auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic
auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic

Outside those keywords that cannot use, we also have extra naming rules.

Names starts with two underscore ('_') and those start with one underscore and a capitalized alphabet are reserved for compiler.

Names starts with two underscore and ends with two underscore are reserved for system-wide standard library.

Names starts with one underscore and a lower-case alphabet, ends with one underscore are reserved for library.

Names all capitalized alphabet, split by underscore, meaning constants.

2.1.2.2.2.9.4 Initialize

Once you finished declaration, which doesn't means you finished the variable definition.

A variable must do initialize, and then can be put into use. Otherwise, you may get random value when you try to reference it.

First time assignment to a variable are called "initialization".

Only for that, with variable declaration and initialization, we can say we finished a variable definition.

From list above, we can see that initialization can be done together with declaration.

int a = 10;
int a = 10;
2.1.2.2.2.9.5 Assignment Operations

Assignment are some operation special to variable.

Most simple one has notation like equation equation in math. We call it assignment operation assignment operation directly.

Operations Description Form
= = Assignment A = val A = val

After program finish a assignment operation, it value store within variable will be replaced.

int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9
int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9

So, this is the meaning of "variable", a space that can store some value. And assignment operation just find those space, and then replace the value inside. Just like the drawer that can store exactly one thing. You may put one thing inside. And you may clear the drawer, and put a new one inside.

2.1.2.2.2.9.6 Composed Assignment Operations

Beyond regular assignment operation, we have some advanced ones. You may compose assignment operation with other mathematics operations. Thus, we got compound assignment operation compound assignment operation .

Operations Description Form Equivalent Form
+= += Addition Assignment A += val A += val A = (typeof(A))(A + val) A = (typeof(A))(A + val)
-= -= Subtraction Assignment A -= val A -= val A = (typeof(A))(A - val) A = (typeof(A))(A - val)
*= *= Multiplication Assignment A *= val A *= val A = (typeof(A))(A * val) A = (typeof(A))(A * val)
/= /= Division Assignment A /= val A /= val A = (typeof(A))(A / val) A = (typeof(A))(A / val)
%= %= Modulus Assignment A %= val A %= val A = (typeof(A))(A % val) A = (typeof(A))(A % val)
^= ^= Bitwise XOR Assignment A ^= val A ^= val A = (typeof(A))(A ^ val) A = (typeof(A))(A ^ val)
|= |= Bitwise OR Assignment A |= val A |= val A = (typeof(A))(A | val) A = (typeof(A))(A | val)
&= &= Bitwise AND Assignment A &= val A &= val A = (typeof(A))(A & val) A = (typeof(A))(A & val)
<<= <<= SHL Assignment A <<= val A <<= val A = (typeof(A))(A << val) A = (typeof(A))(A << val)
>>= >>= SHR Assignment A >>= val A >>= val A = (typeof(A))(A >> val) A = (typeof(A))(A >> val)

Those self-increment operation and self-decrease operations are some kind of same as addition assignment and subtraction assignment:

int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
2.1.2.2.2.10 Type Conversion

As we mentioned before, C is typed language. Each type's variable occupies different spaces.

So, to have one variable has type int int , to be used as long long , we must convert its value into type long. The way to archive this is called type convert.

In types section, we have learnt type boost type boost , this is a kind of special automatically type conversion. Auto type conversion always convert type from smaller ranges to larger. So, that's why we need force type conversion.

To convert a value's type from one to another, add type with brackets before the expression.

(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;
(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;

But force type conversion has a serious problem: it may result in resolution lack. Conversion from int int to char char , is a kind of conversion from large range to smaller range. And it will simply discard higher part of int int value. Instead of the case short short convert to int int , just put all data into lower part of int and everything is OK.

For example,

  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011
  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011

This may cause some unexpected results.

Also, conversion from real numbers to integer will also introduce same problem. All number after decimal point will be dropped directly.

2.1.2.2.2.11 Input And Output

Programs does not only calculation, but also have to tell the result. Thus input and output utilities are indispensable.

Most useful input and output function are provided by printf printf and scanf scanf function in C.

2.1.2.2.2.11.1 printf printf

printf printf , stand for "print with format", a kind of format output method.

So, basically, the function of printf printf is to display some information on screen. And advanced functions are format output string.

2.1.2.2.2.11.1.1 Output

Most basic usage of printf printf is written as following:

printf("output string")
printf("output string")

Anything inside quotations, the string delimiter, except '%', will be displayed as is.

For example, the printf printf here will print "output string" to terminal. The black-backgrounded window on your computer.

For "terminal", the name came from the hardware long long ago.

One thing you must noticed is that, example shown here is just a expression, but a statement. So, in order to make it work, you may have to add a semi-colon, ';', after whole expression.

In most case, the system will refresh output with carriage return, line feed, or both. But printf printf will never add any of which after all content have been printed. So, to let output looks normal, you need to add a new line mark at the end of string:

printf("string with new line mark at end\n")
printf("string with new line mark at end\n")

Outside end of line, new line mark can also added inside a sentence.

printf("string\nwith new line mark inside\n")
printf("string\nwith new line mark inside\n")

This may do the same as following:

printf("string\n");
printf("with new line mark inside\n");
printf("string\n");
printf("with new line mark inside\n");

(why we add semi-colon at the end of sentence? Because you will never able to written two different expression within one statement in such form)

2.1.2.2.2.11.1.2 Placeholder & format

And how about advanced functions?

The format feature is provided by placeholders. Have you ever remember I have mentioned '%' before? Percentage mark works like placeholder here, and that's why it cannot be printed directly using printf printf . The method to print out '%' into screen is done by writing '%' as "%%" in format string, the first argument provided for printf printf .

Since printf printf has the name "print with format", the placeholder must have not only the function to prevent percentage mark to be evaluated and printed. So, let us investigate more about placeholders.

As we all know, C programming language has classified data into different types. So that placeholders must have different form so that printf printf function can then distinct them. Those decorator for placeholders are called "type specifier". And a full placeholder are written according to such syntax:

<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>
<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>

Looks complex? Just quick glance and move forward, examples says more than standard:

type specifier Description Form Expected Data
a a , A A Output floats in hexadecimal %a %a Reals: float, double, double
d d Output integer in decimal %d %d Integers: char, short, int
o o Output integer in octal %o %o Integers: char, short, int
x x , X X Output integer in hexadecimal %x %x Integers: char, short, int
u u Output unsigned in octal %u %u Unsigned Integers: unsigned char, short, int
f f Output reals in decimal %f %f Reals: float
e e , E E Output reals in exponent %e %e Reals: float
g g , G G Output reals in shorter form %g %g Reals: float
c c Output Character %g %g Character: char
s s Output Character String %s %s String: char[] char[]
p p Output Address %p %p Pointer: * *

And their long version variants:

type specifier Description Form Expected Data
ld ld Output integer in decimal %ld %ld Integers: long
lo lo Output integer in octal %lo %lo Integers: long
lx lx , lX lX Output integer in hexadecimal %lx %lx Integers: long
lu lu Output unsigned in octal %lu %lu Unsigned Integers: unsigned long
lld lld Output integer in decimal %lld %lld Integers: long long
llo llo Output integer in octal %llo %llo Integers: long long
llx llx , llX llX Output integer in hexadecimal %llx %llx Integers: long long
llu llu Output unsigned long long in octal %llu %llu Unsigned Integers: unsigned long long
lf lf Output reals in decimal %lf %lf Reals: double
le le , lE lE Output reals in exponent %le %le Reals: double
lg lg , lG lG Output reals in shorter form %lg %lg Reals: double
% % Output % % %% %% None

Here are flags part:

flags Description Form Expected Data
- - Align left, default right %-d %-d None
+ + Force output '+', default not show for positive %+d %+d None
Insert a space before output % d % d None
# # Show '0', '0x' or '0X' with 'o', 'x', 'X' descriptor
force show decimal point with 'e', 'E', 'f'
or, not remove tailed zero with 'g', 'G'
%#d %#d None
0 0 Padding 0 instead of space %0d %0d None

Width, .precision and length:

flags Description Form Expected Data
(number) (number) minimal number of character to print, padding with space, if output longer than this value, output will not be truncated %8d %8d None
* * width not specified in format string, but obtained as parameter before argument to be formatted %*d %*d Integer: char, short, int
.number .number for integers (d, i, o, u, x, X): minimal digits to be written, less than this value will padding by 0. Longer than this value will affect nothing. 0 means nothing to print
for e, E, f: digits after decimal point
for g, G: maximal digits to be printed
s: maximal length of a sting, default, all character will be printed, until '0'
c: nothing affected
nothing placed will introduce a 1
%.10d %.f %.10d %.f None
.* .* precision not specified, but obtained as parameter before argument to be formatted %.10d %.f %.10d %.f Integer: char, short, int
h h parameter as short, for i, d, o, u, x, X %hd %hd None
l l parameter as long, for i, d, o, u, x, X
double, for f
wide char, for c
wchar string, for s
%ld %ld None
ll ll parameter as long long, for i, d, o, u, x, X
long double, for e, E, f, g, G
%lld %lld None
L L parameter as long long, for e, E, f, g, G
parameter as long long, for i, d, o, u, x, X
%Lf %Lf None

And prinf prinf will return total character it printed.

You may able to print ASCII code using printf printf now:

#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}
#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}

Definition of printf printf function is written as:

int printf(const char * fmt, ...);
int printf(const char * fmt, ...);

So, you can call it using the form:

printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
2.1.2.2.2.11.2 scanf scanf

Once we learnt output part, it is also necessary to have a glance to input part.

The usage of scanf scanf is roughly like to printf printf , except function calling methods. Scanf Scanf stands for "Scan from format", so, it necessarily needs placeholder as printf printf .

Placeholders are written in this form:

<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>
<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>

Some kind of like to printf printf , right?

part Description Form Expected Data
* * * stand for discard input, or, simply skip data match the type %*d %*d None
width maximum character to be read %8d %8d None
modifiers decorator for type specifier like printf printf %ld %ld None
type data to be scan as %d %d None
part Description Form Expected Data
a a , A A floats scanf("%a", &f) scanf("%a", &f) floats
c c characters, if width is not 0, read width character and set to parameter scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) char
d d integer written in decimal, '+' or '-' are optional scanf("%d", &i) scanf("%d", &i) int
ld ld integer written in decimal, '+' or '-' are optional scanf("%ld", &l) scanf("%ld", &l) long
lld lld integer written in decimal, '+' or '-' are optional scanf("%lld", &ll) scanf("%lld", &ll) long long
e e , E E , f f , F F , g g , G G real numbers, '+' or '-' are optional, 'e' for exponent are optional scanf("%f", &f) scanf("%f", &f) float
i i integer scanf("%i", &i) scanf("%i", &i) int
o o integer written octal scanf("%o", &i) scanf("%o", &i) int
s s string, separated by blanks scanf("%s", s) scanf("%s", s) char[] char[]
u u unsigned int scanf("%u", &u) scanf("%u", &u) unsigned int
x x , X X int written in hexadecimal scanf("%x", &i) scanf("%x", &i) int
p p pointer scanf("%p", &p) scanf("%p", &p) * *
[] [] ranges, simplified regular expression scanf("%[1-9]", &c) scanf("%[1-9]", &c) char
% % % % scanf("%%") scanf("%%") None

Sample question: A+B Problem:

#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
2.1.2.2.2.12 Conditional Statement

Since the program is not only tool to calculating, it also helps people to solve problems require decision.

So, scientists introduces conditional statement. They can decide what to do according to conditions.

2.1.2.2.2.12.1 If

If statement has form of:

if (condition) statement
if (condition) statement

When condition expression part evaluated with true, then statement part will be executed.

if (x < y)
  printf("x less than y");
if (x < y)
  printf("x less than y");

You can see, x < y x < y is condition expression, and if x indeed less than y, the program will output the information.

But this is only the simplest case, what if we want to execute multiple statement within if statement?

Remember code block? Code block can compose different statements together. So:

if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}
if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}

Here, we execute two statements when x larger than current max value.

2.1.2.2.2.12.2 If-Else

Instead of just "if" statement, sometimes we may need "else" part.

if (condition)
  then-statement
else
  else-statement
if (condition)
  then-statement
else
  else-statement

Just similar to if statements, when condition is not 0, or, acceptable, execute then-statement, else, execute else-statement.

Also, you may find some case, you may classify different case, so you can written then like this:

if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement
if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement

This is simply nested if-else statements for each "else if" are new if statement place in else part of further one. This is for beauty, but you can also write like this:

if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}
if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}

Very clear.

2.1.2.2.2.12.3 Ternary if-else operator

三元运算符

Though in most case, if-else statements is enough, it is still the statement but a expression. Thus in some corner condition, written using if-else may result in more lines of code and complexity.

Thus we introduces ternary if-else operator. With this operator, you got a expression, so you can than combine them together with other expressions.

Ternary if-else looks like this

condition ? then : else
condition ? then : else

when condition is true, then part will be executed, and if condition is false, else part will be evaluated. And finally, the value of expression will be return.

So, you may write:

int i = 10;
i = i - 100 < 0 ? 0 : i - 100;
int i = 10;
i = i - 100 < 0 ? 0 : i - 100;

or, in c++, you may found you can write like this: (we must mention c++ here for clear because this style of ternary is indeed not allowed to be written in pure c, but most of programmers may not distinct c/c++)

int i = 0;
int j = 10;
(i < j ? i : j) = 1;
int i = 0;
int j = 10;
(i < j ? i : j) = 1;

(the second case is correct because every operation in c++ are special methods(functions), so = is actually a function call, equivalent style is int::operator=(i< j ? i : j, 1); int::operator=(i< j ? i : j, 1); )

They all correct, but second one is not encouraged to use.

2.1.2.2.2.12.4 Switch-Case

Addition to if-else statement, we also have switch-case statements.

switch (object) {
  case label:
    statements
  case label:
  ...
}
switch (object) {
  case label:
    statements
  case label:
  ...
}

Label can be one of "case literal-value" or "default", and it is not necessary to add brackets if you have multiple statements in one case. Each label means an entry, when object matches label, it will execute start from the position of label, until meets break statements break statements

Then, a legal switch-case statements may look like:

int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
2.1.2.2.2.12.4.1 Break statement

But what does break statement do?

Break statements has two variants. One is here, break statements used to jump out of the switch case statements' execution sequence.

When c finds object matches the label, and it will execute each statements after the label until meets end bracket, but in some case, actually, most case, you may not want it to do so. So, break can break whole process, when it executed break statements, it will simply jump out of switch-case statements, and rest statements inside will not be executed.

Though break statements in switch-case is not mandatory, but it is a good habit to add break for each label.

2.1.2.2.2.13 Loop

What if you want to execute multiple, same, or equivalent same statements? Here we needs loop.

Loop are some statements can execute other statements repeatedly according to some condition.

2.1.2.2.2.13.1 While

While loop looks similar to if statement,

while (condition)
  loop-body
while (condition)
  loop-body

and works similar to if statement as well. When condition is true, then loop-body will be executed.

Furthermore, most similar part between while loop and if statement is that body of loop has still single statement. If you want multiple statements to be evaluated, you must add brackets.

while (1) {
  printf("infinity loop\n");
}
while (1) {
  printf("infinity loop\n");
}
2.1.2.2.2.13.2 For

For loop is another type of loop, it may not that clear to have the name "for",

for (initial; condition; update)
  loop-body
for (initial; condition; update)
  loop-body

for loop always have four part.

Initial part give the ability to define loop variable and initialize them inside the loop. Condition part is same as while loop, if it is true, then body executed, else, just break the process. Loop-body, still, same as if and while loop, execute if everything OK. And finally, update, when loop-body finished, the for loop will do update, to update loop variable.

for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}
for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}

Another important part is that, for totally four part of for loop, initial initial , condition condition , and update update parts can be empty. Thus, you may find in some special case,

for (;;)
  body
for (;;)
  body

can be seen as infinity loop.

2.1.2.2.2.13.3 Do-While

But what if we need to execute body at least once?

Then we need do-while loop.

do {
  body
} while (condition);
do {
  body
} while (condition);

Apart form other statements, do-while loop requires brackets compulsory.

2.1.2.2.2.13.4 Break

Still break, the other form of break is here, when break statement used within the body of loops, it will jump out of whole loop. Discard anything after break. Even update part of for loop.

Similar to switch-case.

2.1.2.2.2.13.5 Continue

Sometimes, you may need to just skip rest of part in body, but not jump out of loop, then you needs continue statement.

When continue executed, it will just go to another round of loop, do update, test condition, and new execution process of body.

2.1.2.2.2.14 Array

When we are dealing with small scale of data, define multiple variables is enough, but how about sequence of data?

For example, read scores of over 500 students and sort them.

In contrast, average and maximum can be done with only one or two variables, but this requires store all information.

Arrays are linear and continuous data structure for storing same type values.

Definition for one-dimension array written as following:

type name[length];
type name[length];

And further, array can be multiple-dimension.

type name[length][length];
type name[length][length][length];
...
type name[length][length];
type name[length][length][length];
...

Once we define an array, then it has length elements stored, you may visit them using index:

name[idx];
name[idx];

each element can be seen as a regular variable whose type is same as type used to define whole array.

And we can then traversal array using loop:

int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}
int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}

Then, how can we initialize an array?

There are two main ways:

type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...
type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...

One is not write length, but just wrap initial values using brackets, the final array will have the length of total count of initial values. The other way is to specify length, and also provide initial value wrapped using brackets.

For multiple-dimension arrays, you must specify other dimension length except first one, and you can write initial values directly in one pair of brackets, but also, spare each dimension array elements using different brackets pair.

2.1.2.2.2.14.1 C Style String

Finally, we come to string part.

As we mentioned before, string and character has some special relationship. Actually, strings in c programming language are array of char.

In C programming language, it will treat char array end with '0' as a string.

2.1.2.2.2.15 sizeof sizeof

Though it is possible to traversal arrays using literals. It is not that convenient.

To simplify operation, we can use sizeof sizeof operator:

sizeof(type)
sizeof(variable)
sizeof(array)
sizeof(type)
sizeof(variable)
sizeof(array)

sizeof sizeof operator will return the total length of target type/variable/array in bytes. So, to have the length of array, we can say that:

int len = sizeof(array) / sizeof(type);
int len = sizeof(array) / sizeof(type);
2.1.2.2.2.16 Iterator

To traversal arrays, using idx idx traversal variable is one possible method. The other way to archive the goal is using iterator.

int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}
int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}

here, we defined p as iterator for array a. And then, it is able to iterate whole array.

The p here is called, pointer points to int.

More detail will be covered in Pointers section.

2.1.2.2.2.17 Function

Function, a kind of contract, accepts some input and generate outputs. Most similar to their mathematical form, any same input provide for a function will result in same output. Furthermore, the format of function is almost same as that in math:

int func(int R);
int func(int R);

You may assume it as: function 𝑓:𝑁𝑁 or 𝑓(𝑥)𝑁,𝑥𝑁 And

float func(float a, float b);
float func(float a, float b);

may represents function 𝑓:𝑅,𝑅𝑅 for 𝑓(𝑣)𝑅,𝑣=𝑎,𝑏,𝑎,𝑏𝑅.

Formally, input in C programming language can be zero or more parameters. And output are something so called "return value". There may exists more way to pass output value other than regular returning method.

Ideally, a function may not affect anything outside itself, this kind of function are seen as pure functional function. But, in normal program, they may need to perform operations other than calculation. For example, I/O. Any operation modify memory, variables outside its own scope, or perform I/O, are defined as side effects of a function.

More particularly, some function in C programming language may have even no returning but side-effects.

2.1.2.2.2.17.1 Definition

To brief understand function in c, first look at the function definition.

Function definition does almost same as variable declaration, but the main purpose it to tell the compiler about a function's name, return type and its parameters, rather than allocate a new space indeed.

We call it prototype.

<return-type> <function-name>(<parameters> ...);
<return-type> <function-name>(<parameters> ...);

Usually, prototype are placed within headers.

For example, you may have prototype for function add add that generate sum of two integer like:

int add (int a, int b);
int add (int a, int b);

Here we declare the function add, which accepts two arguments, corresponding to parameters a, and b respectively.

And then, as variables must initialized before referenced. Functions must have finish implementation before being called.

Function implementation roughly like declaration, but with extra function body part:

<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}
<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}

Body part may be regular statements, but also possible for return return statement.

Purpose of return return statement is tell the program, which value are seen as return value of the function.

Like equation mark in 𝑓(𝑥,𝑦)=𝑥+𝑦.

Here we implement function add add :

int add (int a, int b) {
  return a + b;
}
int add (int a, int b) {
  return a + b;
}
2.1.2.2.2.17.2 Function Calling

Once a function has been defined, it can be used in our program with function call syntax.

As we mentioned very early at the beginning of our tutorial, a function call is written in such form:

<function-name> (<arguments> ...)
<function-name> (<arguments> ...)

And arguments must match parameter in order and type.

For example, if we have a function add defined before,

int add(int a, int b){
  return a + b;
}
int add(int a, int b){
  return a + b;
}

Then we can use it like:

#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}

first argument we provide for add add is integer variable a, which has the same type as parameter a a , and second argument is literal value 20 20 , since any integer literal without suffix will be seen as integer in c, it has also same type with parameter b b . Thus, the function call is acceptable.

But what if we provide arguments less, more, or even has type mismatch? The C programming language will complain about syntax error.

2.1.2.2.2.17.3 Recursion

Since a function can be called within body of other functions, it make nonsense to prevent a function calling it self.

A function that calling it self are called recursion function.

For example, factorial function can be defined using recursion:

int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}
int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

The basic structure of recursion function is similar to normal function, the only difference is that it calls itself within its body.

But since recursion function may call itself infinite times, it must have a terminal condition to stop further calls.

Here the if statement works as terminal condition. When n equals to 0, the function will return 1 directly, without further calling itself.

2.1.2.2.2.17.4 Function Tail Call Optimization

In some case, a function's last operation is calling another function, which is called tail call.

And if a function's last operation is calling itself, it is called tail recursion.

In most case, a infinite tail recursion will result in stack overflow, but with tail call optimization, the compiler can optimize tail calls to avoid the case.

The common way to implement tail call optimization is Continuous Passing Style.

2.1.2.2.2.17.4.1 Continuous Passing Style

Continuous Passing Style (CPS) is a style of programming where control is passed explicitly in the form of a continuation.

2.1.2.2.2.18 Assembly
2.1.2.2.2.18.1 Architecture
2.1.2.2.2.18.1.1 AMD64 (x86_64)
2.1.2.2.2.18.1.2 Aarch64 / arm64
2.1.2.2.2.18.1.3 MIPS / Loong
2.1.2.2.2.18.2 BUS
2.1.2.2.2.18.2.1 Bridges
2.1.2.2.2.18.3 CPU
2.1.2.2.2.18.4 Intel Syntax, AT&T Syntax
2.1.2.2.2.18.5 Memory Access
2.1.2.2.2.18.6 Commands
2.1.2.2.2.18.7 Direct Memory Access
2.1.2.2.2.19 Stack
2.1.2.2.2.19.1 Frames
2.1.2.2.2.19.2 Stack Variables, Local Variables
2.1.2.2.2.19.3 Recursion Function Expansion
2.1.2.2.2.20 Global Variables
2.1.2.2.2.21 Variable Scope
2.1.2.2.2.21.1 Dynamic Scope
2.1.2.2.2.21.2 Lexical Scope
2.1.2.2.2.21.2.1 Function Scope
2.1.2.2.2.21.2.2 Block Scope
2.1.2.2.2.22 Closure
2.1.2.2.2.23 Heap Space
2.1.2.2.2.23.1 Variable Allocation
2.1.2.2.2.24 Memory Management
2.1.2.2.2.24.1 Virtual Memory (OS)
2.1.2.2.2.25 Function Call
2.1.2.2.2.25.1 Function Stack
2.1.2.2.2.25.2 Function In Assembly
2.1.2.2.2.26 goto goto
2.1.2.2.2.27 User Defined Types
2.1.2.2.2.27.1 Struct Struct
2.1.2.2.2.27.1.1 Bit Field
2.1.2.2.2.27.1.2 Simulate class class Using Structure
2.1.2.2.2.27.1.3 Virtual Function Table
2.1.2.2.2.27.2 Enum Enum
2.1.2.2.2.27.3 Union Union
2.1.2.2.2.28 Structure space, Memory Alignment & Offset
2.1.2.2.2.29 Pointers
2.1.2.2.2.29.1 Pointer offset, index & linked list
2.1.2.2.2.29.2 Array, Pointers Points To Continuous Memory
2.1.2.2.2.29.3 Function pointers
2.1.2.2.2.29.3.1 Form
2.1.2.2.2.29.3.2 Function As Function Pointer
2.1.2.2.2.29.3.3 Calling With Function Pointer
2.1.2.2.2.29.3.4 Simplified Function Call
2.1.2.2.2.29.4 Void Pointers
2.1.2.2.2.29.5 Pointer Convert
2.1.2.2.2.30 Pointer in Assembly
2.1.2.2.2.31 Exception
2.1.2.2.2.31.1 setjump setjump , longjump longjump
2.1.2.2.2.31.2 Try-Catch, Throw
2.1.2.2.2.31.3 Seh, Structure exception handler
2.1.2.2.2.31.4 Herbexception
2.1.2.2.2.31.5 Exception spread
2.1.2.2.2.31.6 Condition System
2.1.2.2.2.31.7 Continuous
2.1.2.2.2.32 Preprocessor
2.1.2.2.2.32.1 Header files, #include #include
2.1.2.2.2.32.2 Macro
2.1.2.2.2.32.2.1 C Style Macro
2.1.2.2.2.32.2.2 M4 Macro Language
2.1.2.2.2.32.2.3 C++ Template
2.1.2.2.2.32.2.4 Rust Procedure Macro
2.1.2.2.2.32.2.5 Rust Macro Rules
2.1.2.2.2.32.2.6 Macro Assembly, Pseudocode
2.1.2.2.2.32.2.7 Common Lisp Expansion Macro
2.1.2.2.2.32.2.8 Common Lisp Reader Macro
2.1.2.2.2.32.2.9 Scheme Hygiene Macro System
2.1.2.2.2.32.2.10 Scheme Syntax Rules
2.1.2.2.2.32.2.11 Scheme Syntax Case
2.1.2.2.2.32.2.12 Hygiene for the Unhygienic
2.1.2.2.2.32.3 Compiler Comments
2.1.2.2.2.32.4 #progma #progma
2.1.2.2.2.33 Meta-programming
2.1.2.2.2.34 Compiler
2.1.2.2.2.34.1 Compile Process
2.1.2.2.2.34.2 Compiler Driver
2.1.2.2.2.34.3 Assembler
2.1.2.2.2.34.4 Assemble
2.1.2.2.2.34.5 Assembly Code
2.1.2.2.2.34.6 Linker
2.1.2.2.2.34.7 Link
2.1.2.2.2.35 Executable File
2.1.2.2.2.35.1 Object
2.1.2.2.2.35.2 Executable
2.1.2.2.2.35.3 Executable File Format
2.1.2.2.2.35.3.1 Portable Executable (PE)
2.1.2.2.2.35.3.2 Executable Linkable Format (ELF)
2.1.2.2.2.35.3.3 Mach-5 (Fat-5)
2.1.2.2.2.35.3.4 Common Object File Format (COFF)
2.1.2.2.2.35.3.5 Binary (Bin)
2.1.2.2.2.36 ABI
2.1.2.2.2.36.1 Function Call Conventions
2.1.2.2.2.36.1.1 __cdecl __cdecl
2.1.2.2.2.36.1.2 __stdcall __stdcall
2.1.2.2.2.36.1.3 __fastcall __fastcall
2.1.2.2.2.36.1.4 thiscall thiscall
2.1.2.2.2.36.1.5 Microsoft 4-register fastcall __vectorcall __vectorcall
2.1.2.2.2.36.1.6 System V ABI syscall
2.1.2.2.2.36.2 Function Naming Convention
2.1.2.2.2.36.2.1 C Function Naming Convention
2.1.2.2.2.36.2.2 MSVC C++ Function Naming Convention
2.1.2.2.2.36.2.3 Rust Function Naming Convention
2.1.2.2.2.36.2.4 Common Lisp Naming Convention
2.1.2.2.2.36.3 Endian
2.1.2.2.2.36.4 Dynamic Linked Library
2.1.2.2.2.36.5 Static Linked Library
2.1.2.2.2.36.6 fPIE, fPIC
2.1.2.2.2.37 Multiple File Compile
2.1.2.2.2.37.1 Compile Unit
2.1.2.2.2.37.2 Object
2.1.2.2.2.38 Build Systems
2.1.2.2.2.38.1 C Project Management
2.1.2.2.2.38.2 Makefiles
2.1.2.2.2.38.3 AutoTools
2.1.2.2.2.38.4 CMake
2.1.2.2.2.38.5 VSXMake (VSProj)
2.1.2.2.2.38.6 XMake
2.1.2.2.2.39 Variable Decorator
2.1.2.2.2.40 asm volatile (assembly code : output operands : input operands : clobbers) asm volatile (assembly code : output operands : input operands : clobbers)
2.1.2.2.2.41 __attribute__((attribute)) __attribute__((attribute))
2.1.2.2.2.42 _Generic _Generic
2.1.2.2.2.43 ..., va_start, va_arg, va_end ..., va_start, va_arg, va_end Macro, stdarg.h
2.1.2.2.2.44 __VA_ARGS__ __VA_ARGS__
2.1.2.2.2.45 Variable Length Array
2.1.2.2.2.46 ASCII, EBCDIC, Unicode/UCS-II
2.1.2.2.3  From The C Programming Language To Theoretical Computer Science (Section II) [S2]
2.1.2.2.3.1 From the C programming language to Theoretical Computer Science
2.1.2.2.3.1.1 Object-Oriented Programming
2.1.2.2.3.1.2 Generic Types
2.1.2.2.3.1.2.1 Template
2.1.2.2.3.1.2.2 Types Erase
2.1.2.2.3.1.3 Inheritance
2.1.2.2.3.1.3.1 Class Object
2.1.2.2.3.1.3.2 Prototype Chain
2.1.2.2.3.1.4 Polymorphism
2.1.2.2.3.1.4.1 Interface
2.1.2.2.3.1.4.2 Trait
2.1.2.2.3.1.4.3 Duck Type
2.1.2.2.3.1.5 Encapsulation
2.1.2.2.3.1.5.1 Accessibility
2.1.2.2.3.1.6 Object System
2.1.2.2.3.1.6.1
2.1.2.2.3.1.7 Turning Machine
2.1.2.2.3.1.8 Lambda Calculus
2.1.2.2.3.1.9 First Order Function
2.1.2.2.3.1.9.1 Church numeral
2.1.2.2.3.1.10 Formal Verification
2.1.2.2.3.1.11
2.1.2.3 computer architecture
2.1.2.3.1  Stanford CS107: Programming Paradigm [S1]
2.1.2.3.1.1 Data Types and Conversion
2.1.2.3.1.1.1 Binary Numbers

对于正数, 直接相加即可得到结果(在范围内)

对于含负数数, 需要通过一种方式表示它的正负性

  1. 原码: 选取数值的最高位, 0为正1为负.

    直接用最高位为1的数表示, 与正数相加时可能会取得不正确的结果. 对于一个负数, 不能采用通常二进制加法, 简单将最高位置1.

       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)
       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)

    需要保证运算过后, 可以使得负数与对应正数相加值为0(最高位1溢出).

  2. 反码 1's complement: 将数值原样取反.

    正数与绝对值相同的负数相加, 和为全1, 会造成+0和-0问题

       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
  3. 补码 2's complement: 将2中结果+1, 则为所需结果, 对于实用, 将值加到负数中

       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)
       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)

    补码的数学含义: 模数加法构成阿贝尔群: 正整数的加法逆元

2.1.2.3.1.1.2 Characters

字符本身即为数字

2.1.2.3.1.1.3 Convert

小数值的赋值近似直接将对应值赋值到大数值的低位

大数值赋值到小数值空间, 直接抛弃高位

负数赋值会用符号位填充高位(逻辑赋值), 或填0

2.1.2.3.1.1.4 Floats
  1. 定点二进制小数: 采用几个位数表示 2^{-n}

    可以表示的整数和小数的位数一定,

    浮点数, 用以有限位数和精度逼近稠密数域上的精确小数

  2. float 32: IEEE 754 2-based float number

    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]
    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]

    实际上来说, val(10)=(1) sign ×1.baseexp2bits(exp)1+1

2.1.2.3.1.1.5 Endian

最高位所在的字节称为大端,最低位所在的字节称为小端.

小端序: 高位在低字节 大端序: 高位在高字节

大端符合人类阅读习惯

指针指向会被字节序影响

2.1.2.3.1.2 Structure ( struct struct )

指针指向结构的起始地址, 其他元素通过相对于起始地址 (基地址,类似汇编的基地址和偏移地址的关系, 汇编的偏移地址以0x10为基, 此处偏移地址以0x1为基且偏移地址的值相等于之前变量的长度的总和) 的偏移访问.

2.1.2.3.1.2.1 Array

指针指向数组的起始地址, 其他元素通过相对于起始地址的偏移访问. 总体类似于结构, 但是偏移地址的长度等于n倍的元素变量长度

2.1.2.3.1.2.2 Generic

c风格的泛型,

void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}
void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}

相对于模板, c风格的泛型不需要为相同内核的算法生成不同的二进制. 可以规避二进制膨胀问题

lsearch lsearch 参考 [ulibs.c: binsearch_linear](https://github.com/mujiu555/ublis.c)

Example for generic:

c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
2.1.2.3.1.3 Stack
2.1.2.3.1.3.1 Stack with int
2.1.2.3.1.3.2 Generic Stack
2.1.2.3.1.4 Memory Management

若需要在析构泛型栈的同时析构内部元素, 则需要提供释放函数, 以便于析构.

需要确定指针与地址.

2.1.2.3.1.5 Memory Segments

Soft managed memory:

When a program are loaded to memory, the heap part is managed by malloc malloc , relloc relloc , free free .

The memory space allocated for you will contains more bytes just before the head. The meta data information.

Thus, free(head+offset); free(head+offset); is not allowed. For malloc malloc needs meta data, index with offset will lead to crash.

Furthermore, free a array is not allowed, as well. For array are space allocated in stack and managed by compiler. Which also contains no meta data.

Memory manager may spilt memory into segments, and just allocate memory space for you within some specify segment if request less than 2^n bytes.

2.1.2.3.1.5.1 Memory compose

Split a large space of memory to handle memory allocation using handler. Handler are some pointer points to the pointer points to actual memory.

2.1.2.3.1.5.2 Stack segment

Stack depth roughly relative with function call count.

When define a variable or array within a function, like main, it will create stack frame, increase stack top. (Stack increase towards low address). (Similarly, heap increase towards higher address).

Stack top pointer is embedded within stack and split the stack and gap. (Gap is the space between heap and stack)

When a function has been called, a stack frame will create for it, when a function exited, stack top pointer will go back to where before frame.

Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
2.1.2.3.1.5.3 Memory Management

When memory allocating, memory allocator will not only allocate memory you request, but also some extra memory for meta data.

text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to
text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to

Some times memory manager may use some free space for storing free space block meta data.

Allocate strategy:

  • Best fit
  • Worst fit
  • First fit
  • Continuous search

Some times memory allocator may return more space you need, but you can only rely on space you request.

Compact:

2.1.2.3.1.6 Section IX: Computer architecture

If have code:

c
int i;
int j;

i = 10;
j = i + 7;
j ++;
c
int i;
int j;

i = 10;
j = i + 7;
j ++;

Assuming memory segment:

text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+
text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+

Assume i, j are packed together within stack. BP storing stack base address.

To visit variable i i , using [SP+4] [SP+4] . Thus, i = 10; i = 10; could be written as mov [sp+4], 10 mov [sp+4], 10

For j = i + 7 j = i + 7 , it should first load i i and then do ALU operation.

  • load i i : mov r1, [sp+4] mov r1, [sp+4]
  • add: add r2, 7 add r2, 7

Then, mov [sp], r2 mov [sp], r2 . And, inc [sp] inc [sp]

2.1.2.3.1.6.1 Load / Store, ALU Operations
2.1.2.3.1.6.2 force conversion

Force conversion just cheat compiler rather than assembler. Assembler knows only address.

2.1.2.3.1.7 activate record: function call frame

If have: prototype:

void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}
void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}

The argument of corresponding parameter and the local variables are placed in almost close place.

4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why
4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why

When calling within other functions: like main main :

int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}
int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}

We may have:

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp

at initial.

Then, allocate space for variable i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp

Assign for i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp

When calling foo foo : pushing argument to stack for foo foo :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
2.1.2.3.1.8 Section XI: Swap, call in assembly
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}

In assembly, _cdecl _cdecl , arguments are pushed in reverse order:

_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp
_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp

While swap swap may written as:

void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}
void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}

8 bytes are reserved for saved pc saved pc and 16 bytes for 2 arguments. a a for rsp - 8 rsp - 8 , b b for rsp - 16 rsp - 16 since the program runs in x86_64 machine. Left most parameter lays at the button of stack frame.

In c:

void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}
void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}

swap swap function does not implemented as code shown in c, but use xchg xchg .

2.1.2.3.1.9 Pre-process, Compile, Assemble, Link

Code -> Processed Code -> Assembled Code -> Objected File -> Executable File

2.1.2.3.1.9.1 Preprocessor
2.1.2.3.1.9.1.1 #define #define

Replacement of text appear in source file.

  1. constant replacement

    #define SIZE 1024
    char buf[SIZE];
    #define SIZE 1024
    char buf[SIZE];
  2. parameterized macro

    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
2.1.2.3.1.9.1.2 #include #include
2.1.2.3.1.9.2 compiler
2.1.2.3.1.10 Section XIII:

What if comment #include <stdio.h> #include <stdio.h> ?

The program can probably still be compiled.

What if comment #include <stdlib.h> #include <stdlib.h> ?

assert assert will be seen as a function and the final object file will miss the symbol.

void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}
void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}

Will loop, forever.

What will happen if

int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}
int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}

Two function have same memory structure so that the Print can work correctly, since the function Declare Declare will not clean whole bit pattern after returning.

The technology is called "Channeling".

2.1.2.3.1.10.1 multiple arguments

Push arguments from right to left. For better organization of compiler.

2.1.2.3.1.11 Multiple Threads

Operating systems give different process a virtual memory. So that the program can assuming it holds all memory.

Kernel trace and maintaining Virtual Memory Mapping Table and calls MMU to map virtual memory of each process to real memory.

Program execution is sequential.

When multiple processes share one shared data, it may manipulate the data after other process manipulate it already. E.g., read a variable and check it already fit the requirement, when it about to do operation on it, it was switched to another process by scheduler, and the other process do operation on the variable successfully. When the time scheduler dispatch back to original one, it will never able to validate the variable and do same operation to the variable. Which cause the error.

The condition happened here called race condition.

There always be some critical section in code, when code executing in critical section, it will never able to validate the shared data again.

The solution is to use semaphore or lock to protect critical section. When a process want to enter critical section, it will try to acquire the lock.

Semaphore is a integer variable with atomic operation ability, when it is 0, the process can not enter critical section, else if it is greater than 0, the process can enter critical section and decrease the semaphore by 1 atomically. When leaving critical section, add the semaphore, release the resource.

Semaphore operations acquire resources.

2.1.2.3.1.11.1 Producer Consumer Problem

Producer generates data, puts into a buffer. Consumer takes data from buffer, process it.

Consumer should not take data when buffer is empty. Producer should not put data when buffer is full.

Use two semaphores to track the number of empty slots and full slots in buffer.

2.1.2.3.1.11.2 Reader Writer Problem

Reader Writer problem is a classic synchronization problem. With two types of processes, readers and writers, readers can read shared data simultaneously, writers need exclusive access to shared data.

2.1.2.3.1.11.3 Philosophers Dining Problem

Every philosopher needs two forks to eat. Five philosophers sitting around a table, when a philosopher wants to eat, it will try to pick up the left and right forks. But if all philosophers pick up the left fork first, then they will never able to pick up the right fork,

This is a deadlock.

2.1.2.3.1.11.4 Ice cream Shop Problem
2.1.2.3.1.12 Functional Programming Paradigm

In functional programming paradigm, each function are treated as regular mathematical function. Which accepts some input and produce some output.

;car
;cdr
;car
;cdr

car car in scheme extracts the first element of a list. While cdr cdr extracts the rest of the list.

Known already, so for short, Mujiu will not explain more about scheme here.

In scheme, or in lisp, car and cdr comes from lisp machine assembly instruction. There are two registers, address register and data register, which is the ar ar and dr dr where car car and cdr cdr comes from.

2.1.2.4 programming paradigm
2.1.2.4.1  MIT 6.001: Structure and Interpretation of Computer Programs (SICP) [S1]

“ Computer science is not about computers, any more than astronomy is about telescopes, or biology about microscopes. ”

Computer is neither about science nor about computers, instead of a subject that helps explore the nature of computation itself, it is a engineering discipline that focuses on building systems that perform computations, aka., how to use computers to solve problems.

Likely geometry, which originally focused on measuring land, later evolved into a abstract mathematical discipline that studies the properties of space and shapes.

The main problem the computer science tries to solve is to describe the process of computation.

In mathematics, functions are used to describe relationships between quantities. In this aspect, a equation cannot tell us how to compute the value of a function. And computer science can provide us a way to describe such process, to compute and solve the functions.

The main purpose is to find the way to formalize such process, to describe the process of computation itself. In some case, the systems can be such large and complex that nobody can fully understand the whole system. And that's why we need to build abstractions to help us manage the complexity of such systems. What make this possible is the idea of procedures, which can be used to build abstractions. A technique to manage complexity.

Computer is a virtual environment that will not affect by real world constraints, such that the system can be built in any way we want. The only limitation is our imagination and creativity. A ideal system.

2.1.2.4.1.1 Preface
2.1.2.4.1.2 Section 1: Building Abstractions with Procedures

The first way to build abstraction is black boxes, aka., procedures. Which accepts some inputs, and produce some outputs, without revealing the internal details of how the procedure works. This way is called encapsulation nowadays.

Fix points: A fix point of a function is a value that does not change under the application of the function. And in this case, what we want to do is to find a way that can compute such fix points. Package the process into procedures. And how can we archive this is a instructive knowledge. How about to apply such procedure? How about to use such procedure to find the fix points of other functions? And how about to build new procedures that build upon such procedure?

In this chapter, we'd talk about several topics:

  • Primitive Elements
  • Combinations
  • Abstract and how to build new abstractions
  • Extract common patterns
2.1.2.4.1.2.1 Lisp

The main purpose to have such section is not to programming in Lisp, rather than to learn how to think about programming. What is about to learn is a general framework, which compose of primitives, means of combination, and the means of abstraction.

The combination of Lisp expressions are organized in a tree structure, aka., S-expressions. P.S., in compiler, such tree structure is called Abstract Syntax Tree (AST).

2.1.2.4.1.2.2 define define

The way to build new abstractions is using define define . By extract general ideas from specific examples, it is possible to create new procedures.

2.1.2.4.2  Stanford CS107: Programming Paradigm [S1]
2.1.2.4.2.1 Data Types and Conversion
2.1.2.4.2.1.1 Binary Numbers

对于正数, 直接相加即可得到结果(在范围内)

对于含负数数, 需要通过一种方式表示它的正负性

  1. 原码: 选取数值的最高位, 0为正1为负.

    直接用最高位为1的数表示, 与正数相加时可能会取得不正确的结果. 对于一个负数, 不能采用通常二进制加法, 简单将最高位置1.

       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)
       00000000 000000111   (+7)
     + 10000000 000000111   (-7)
    ----------------------
       10000000 000001110   (-14)

    需要保证运算过后, 可以使得负数与对应正数相加值为0(最高位1溢出).

  2. 反码 1's complement: 将数值原样取反.

    正数与绝对值相同的负数相加, 和为全1, 会造成+0和-0问题

       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
       00000000 000000111   (+7)
     + 11111111 111111000   (-7)
    ----------------------
       11111111 111111111   (0xffff)
  3. 补码 2's complement: 将2中结果+1, 则为所需结果, 对于实用, 将值加到负数中

       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)
       00000000 000000111   (+7)
     + 11111111 111111001   (-7)
    ----------------------
    (0)00000000 000000000   (0x0000)

    补码的数学含义: 模数加法构成阿贝尔群: 正整数的加法逆元

2.1.2.4.2.1.2 Characters

字符本身即为数字

2.1.2.4.2.1.3 Convert

小数值的赋值近似直接将对应值赋值到大数值的低位

大数值赋值到小数值空间, 直接抛弃高位

负数赋值会用符号位填充高位(逻辑赋值), 或填0

2.1.2.4.2.1.4 Floats
  1. 定点二进制小数: 采用几个位数表示 2^{-n}

    可以表示的整数和小数的位数一定,

    浮点数, 用以有限位数和精度逼近稠密数域上的精确小数

  2. float 32: IEEE 754 2-based float number

    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]
    [sign] [<<--- mangnitude -->>] [<-fractions>]
    [1/0 ] [exp(unsigned integer)] [base(2^{-n})]

    实际上来说, val(10)=(1) sign ×1.baseexp2bits(exp)1+1

2.1.2.4.2.1.5 Endian

最高位所在的字节称为大端,最低位所在的字节称为小端.

小端序: 高位在低字节 大端序: 高位在高字节

大端符合人类阅读习惯

指针指向会被字节序影响

2.1.2.4.2.2 Structure ( struct struct )

指针指向结构的起始地址, 其他元素通过相对于起始地址 (基地址,类似汇编的基地址和偏移地址的关系, 汇编的偏移地址以0x10为基, 此处偏移地址以0x1为基且偏移地址的值相等于之前变量的长度的总和) 的偏移访问.

2.1.2.4.2.2.1 Array

指针指向数组的起始地址, 其他元素通过相对于起始地址的偏移访问. 总体类似于结构, 但是偏移地址的长度等于n倍的元素变量长度

2.1.2.4.2.2.2 Generic

c风格的泛型,

void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}
void swap(void* ap, void* bp, size_t size) {
  byte_t tmp[size];
  memcpy(tmp, a, size);
  memcpy(a, b, size);
  memcpy(b, tmp, size);
}

相对于模板, c风格的泛型不需要为相同内核的算法生成不同的二进制. 可以规避二进制膨胀问题

lsearch lsearch 参考 [ulibs.c: binsearch_linear](https://github.com/mujiu555/ublis.c)

Example for generic:

c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
c
void * lsearch (
  void* key,
  void * base,
  int n,
  int elem_size,
  int (* cmpfn)(void *, void *)
) {
  for (int i = 0; i < n; i ++) {
    void * elemAddr = (u8_t*) base + i * elem_size;
    if (cmpfn (key, elemAddr) == 0) {
      return elemAddr;
    }
  }
  return NULL;
}
2.1.2.4.2.3 Stack
2.1.2.4.2.3.1 Stack with int
2.1.2.4.2.3.2 Generic Stack
2.1.2.4.2.4 Memory Management

若需要在析构泛型栈的同时析构内部元素, 则需要提供释放函数, 以便于析构.

需要确定指针与地址.

2.1.2.4.2.5 Memory Segments

Soft managed memory:

When a program are loaded to memory, the heap part is managed by malloc malloc , relloc relloc , free free .

The memory space allocated for you will contains more bytes just before the head. The meta data information.

Thus, free(head+offset); free(head+offset); is not allowed. For malloc malloc needs meta data, index with offset will lead to crash.

Furthermore, free a array is not allowed, as well. For array are space allocated in stack and managed by compiler. Which also contains no meta data.

Memory manager may spilt memory into segments, and just allocate memory space for you within some specify segment if request less than 2^n bytes.

2.1.2.4.2.5.1 Memory compose

Split a large space of memory to handle memory allocation using handler. Handler are some pointer points to the pointer points to actual memory.

2.1.2.4.2.5.2 Stack segment

Stack depth roughly relative with function call count.

When define a variable or array within a function, like main, it will create stack frame, increase stack top. (Stack increase towards low address). (Similarly, heap increase towards higher address).

Stack top pointer is embedded within stack and split the stack and gap. (Gap is the space between heap and stack)

When a function has been called, a stack frame will create for it, when a function exited, stack top pointer will go back to where before frame.

Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
Relatively slow RAM (Compared to register):

High address        +-----------------------+       +-------------+
                    |                       |      /|    ARG-N    |
                    |                       |     / |    .....    |
                    |                       |    /  |    ARG-1    |
                    |                       |   /   |-------------|
                    |-----------------------|  /    |  <Ret Addr> | <- BP
                    |                       | /    .|-------------|
                    |         Stack         |/    / |   <Old SP>  |
                    |                       |    /  |-------------|
              BP -> |-----------------------| --`   |   Local-1   |
                    |         Frame         |       |   .......   |
              SP -> |-----------------------| ----. |   Local-N   |
                    |                       |\     `|-------------|
                    |                       | \     |    ARGs-N   | <- SP
                    |                       |  \    |             |
                    |                       |   \   |             |
                    |                       |    \  |             |
                    |                       |     \ +-------------+
                    |                       | <- "Gap"
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |                       |
                    |         Heap          |
                    |                       |
                    |                       |
                    |                       |
                    |-----------------------|
                    |                       |
                    |     .Section code     |
                    |                       |
Low address         +-----------------------+
2.1.2.4.2.5.3 Memory Management

When memory allocating, memory allocator will not only allocate memory you request, but also some extra memory for meta data.

text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to
text
|   total space allocated    |
| head | space you allocated |
       ^
       pointer points to

Some times memory manager may use some free space for storing free space block meta data.

Allocate strategy:

  • Best fit
  • Worst fit
  • First fit
  • Continuous search

Some times memory allocator may return more space you need, but you can only rely on space you request.

Compact:

2.1.2.4.2.6 Section IX: Computer architecture

If have code:

c
int i;
int j;

i = 10;
j = i + 7;
j ++;
c
int i;
int j;

i = 10;
j = i + 7;
j ++;

Assuming memory segment:

text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+
text
       +-----------+
0xf000 |           |
0xeffc |           |
       |   | i |   | <- BP
       |   | j |   |
       |           | <- SP
       |           |
       |           |
       |           |
       |           |
       |           |
       +- - - - - -+
....
       +- - - - - -+
0x1000 |           |
       +-----------+

Assume i, j are packed together within stack. BP storing stack base address.

To visit variable i i , using [SP+4] [SP+4] . Thus, i = 10; i = 10; could be written as mov [sp+4], 10 mov [sp+4], 10

For j = i + 7 j = i + 7 , it should first load i i and then do ALU operation.

  • load i i : mov r1, [sp+4] mov r1, [sp+4]
  • add: add r2, 7 add r2, 7

Then, mov [sp], r2 mov [sp], r2 . And, inc [sp] inc [sp]

2.1.2.4.2.6.1 Load / Store, ALU Operations
2.1.2.4.2.6.2 force conversion

Force conversion just cheat compiler rather than assembler. Assembler knows only address.

2.1.2.4.2.7 activate record: function call frame

If have: prototype:

void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}
void foo(int bar, int * baz) {
  char sninke[4];
  short * why;
  // ...
}

The argument of corresponding parameter and the local variables are placed in almost close place.

4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why
4 byte
        |         | baz
        |         | bar
        | < ret > |
        |         | snike
        |         | why

When calling within other functions: like main main :

int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}
int main (int argc, char * argv[]) {
  int i = 4;
  foo(i, &i);
  return 0;
}

We may have:

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC <- sp

at initial.

Then, allocate space for variable i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         |      <- sp

Assign for i i :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |    4    | i    <- sp

When calling foo foo : pushing argument to stack for foo foo :

4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
4 byte
0xffff  |         | argv -> |  ||||
0xfffc  |         | argc
        | < ret > | Saved PC
        |         | i
        | argument| i
        | argument| &i   <- sp
2.1.2.4.2.8 Section XI: Swap, call in assembly
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}
void foo() {
  int x = 11;
  int y = 17;
  swap(&x, &y);
}

In assembly, _cdecl _cdecl , arguments are pushed in reverse order:

_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp
_foo:
  push rbp
  mov rbp, rsp

  sub rsp, 8              ; x, y are 4 bytes each, total 8 bytes
  mov dword [rsp + 4], 11 ; x = 11
  mov dword [rsp], 17     ; y = 17

  push qword [rsp]
  add rsp, 8              ; clean up stack after call

  mov rax, 60
  mov rdi, 0
  syscall

  mov rsp, rbp
  pop rbp

While swap swap may written as:

void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}
void swap(int * a, int * b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}

8 bytes are reserved for saved pc saved pc and 16 bytes for 2 arguments. a a for rsp - 8 rsp - 8 , b b for rsp - 16 rsp - 16 since the program runs in x86_64 machine. Left most parameter lays at the button of stack frame.

In c:

void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}
void __attribute__((naked)) swap(int *ap, int *bp) {

  asm volatile(
      // fetch arguments from stack
      "mov rbx, [rsp + 8];\n"
      "mov eax, [rbx];\n"
      "mov rbx, [rsp + 16];\n"
      "xchg eax, [rbx];\n"
      "mov rbx, [rsp + 8];\n"
      "mov [rbx], eax;\n"

      "ret;\n"
      :
      :
      : "rsi", "rdi", "memory");
}

void __attribute__((naked)) foo() {

  asm volatile(
      // initialize variables
      // push rbp;
      // mov rbp, rsp;
      // for better if possible
      "sub rsp, 8;\n"
      "mov dword ptr [rsp + 4], 11;\n"
      "mov dword ptr [rsp], 17;\n"

      "lea rax, [rsp + 4];\n"
      "push rax;\n"
      "lea rax, [rsp];\n"
      "push rax;\n"

      "call swap;\n"

      "add rsp, 16;\n" // clean up calling

      // clean up stack
      // also possible to use
      // `mov rsp, rbp; push rbp;`
      // if bp is set
      "add rsp, 8;\n"
      "ret;\n"
      :
      :
      : "memory");
}

int main(int argc, char *argv[]) {
  foo();
  return 0;
}

swap swap function does not implemented as code shown in c, but use xchg xchg .

2.1.2.4.2.9 Pre-process, Compile, Assemble, Link

Code -> Processed Code -> Assembled Code -> Objected File -> Executable File

2.1.2.4.2.9.1 Preprocessor
2.1.2.4.2.9.1.1 #define #define

Replacement of text appear in source file.

  1. constant replacement

    #define SIZE 1024
    char buf[SIZE];
    #define SIZE 1024
    char buf[SIZE];
  2. parameterized macro

    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
    #define MAX(a, b) ((a) > (b) ? (a) : (b))
    int x = MAX(3, 5);
2.1.2.4.2.9.1.2 #include #include
2.1.2.4.2.9.2 compiler
2.1.2.4.2.10 Section XIII:

What if comment #include <stdio.h> #include <stdio.h> ?

The program can probably still be compiled.

What if comment #include <stdlib.h> #include <stdlib.h> ?

assert assert will be seen as a function and the final object file will miss the symbol.

void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}
void foo() {
  int i;
  int array[4];
  for (i = 0; i <= 7 /* for 32-bit alignment requirement in x86_64 Linux, there are 3-bits padding */; i ++) {
    array[i] = 0;
  }
}

Will loop, forever.

What will happen if

int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}
int Declare() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    array[i] = i;
  }
}

int Print() {
  int array[100];
  for (int i = 0; i < 100; i ++) {
    printf("%d", array[i]);
  }
}

Two function have same memory structure so that the Print can work correctly, since the function Declare Declare will not clean whole bit pattern after returning.

The technology is called "Channeling".

2.1.2.4.2.10.1 multiple arguments

Push arguments from right to left. For better organization of compiler.

2.1.2.4.2.11 Multiple Threads

Operating systems give different process a virtual memory. So that the program can assuming it holds all memory.

Kernel trace and maintaining Virtual Memory Mapping Table and calls MMU to map virtual memory of each process to real memory.

Program execution is sequential.

When multiple processes share one shared data, it may manipulate the data after other process manipulate it already. E.g., read a variable and check it already fit the requirement, when it about to do operation on it, it was switched to another process by scheduler, and the other process do operation on the variable successfully. When the time scheduler dispatch back to original one, it will never able to validate the variable and do same operation to the variable. Which cause the error.

The condition happened here called race condition.

There always be some critical section in code, when code executing in critical section, it will never able to validate the shared data again.

The solution is to use semaphore or lock to protect critical section. When a process want to enter critical section, it will try to acquire the lock.

Semaphore is a integer variable with atomic operation ability, when it is 0, the process can not enter critical section, else if it is greater than 0, the process can enter critical section and decrease the semaphore by 1 atomically. When leaving critical section, add the semaphore, release the resource.

Semaphore operations acquire resources.

2.1.2.4.2.11.1 Producer Consumer Problem

Producer generates data, puts into a buffer. Consumer takes data from buffer, process it.

Consumer should not take data when buffer is empty. Producer should not put data when buffer is full.

Use two semaphores to track the number of empty slots and full slots in buffer.

2.1.2.4.2.11.2 Reader Writer Problem

Reader Writer problem is a classic synchronization problem. With two types of processes, readers and writers, readers can read shared data simultaneously, writers need exclusive access to shared data.

2.1.2.4.2.11.3 Philosophers Dining Problem

Every philosopher needs two forks to eat. Five philosophers sitting around a table, when a philosopher wants to eat, it will try to pick up the left and right forks. But if all philosophers pick up the left fork first, then they will never able to pick up the right fork,

This is a deadlock.

2.1.2.4.2.11.4 Ice cream Shop Problem
2.1.2.4.2.12 Functional Programming Paradigm

In functional programming paradigm, each function are treated as regular mathematical function. Which accepts some input and produce some output.

;car
;cdr
;car
;cdr

car car in scheme extracts the first element of a list. While cdr cdr extracts the rest of the list.

Known already, so for short, Mujiu will not explain more about scheme here.

In scheme, or in lisp, car and cdr comes from lisp machine assembly instruction. There are two registers, address register and data register, which is the ar ar and dr dr where car car and cdr cdr comes from.

2.1.2.5 programming theory
2.1.2.5.1  From The C Programming Language To Theoretical Computer Science (Section I) [S1]
2.1.2.5.1.1 Section I: C Programming Language

To have a glance to computer science, we must have known a programming language, and then it could lead you to understand some key concept within the computer and programming language design.

2.1.2.5.1.2 Intro

C语言, 历史悠长, 自从它于80年代伴随 Unix 出现, 便成为了全世界开发者的心头好. 至今为止都依然被广泛使用. 上到各种琳琅满目的应用程序, 下到操作系统内核, 都可以由C编写, 都依赖C的代码.

举个例子: 世界上的绝大多数服务器, 都是由 Linux Linux 承载着的, 而 Linux Linux 的内核, 几乎只有 C C 所编写的代码. 当然, 在大家的手机上, 任何一部安卓手机, 它的内核, 其实也是Linux, 可以说, C 驱动着世界上绝大多数设备的运行. (之所以不用Windows举例, 一是Windows是一个闭源产品, 二是Windows内核主要由微软自己魔改的C++代码编写)

C是一门高级语言, 但是何为高级语言?

2.1.2.5.1.3 High Level Language

高级语言是相对于低级语言而言的. 一般而言, 我们所说的低级语言, 是各个不同设备上面的汇编语言, 这些语言非常强大, 可以操作 CPU, 也非常基础, 一旦没有它们, 任何后续的工作都无法进行.

但是它们的问题也非常严重. 那就是它们与平台极度绑定, 一段代码, 只能在特定平台上工作. 即便逻辑相似, 或者完全一致, 但是你还是不得不按照不同平台的规定, 为它们依次适配. 这仅仅只是开发过程, 就已经可以体会到通过低级语言开发程序的麻烦了. 而到了软件升级这一步骤, 这样的一套流程就更加恐怖, 复杂度直线上升.

而高级语言, 是一种对于低级语言共同特征的抽象, 帮助程序员写出可以在不同平台间无痛或相对轻松移植的代码.

低级语言, 就像是专门为特定的设备编写的特制工具, 只能在某台设备上面使用. 它们虽然可以直接操作硬件设备, 但是写起来非常复杂. 而高级语言, 比如C或者Python, 可以让程序员使用更加容易理解的方式写出程序. 系统可以帮你, 将你的代码, "翻译" 成为机器可以理解的指令, 这样即便不担心底层的细节, 也能让程序在不同的设备上运行.

当通过C编程语言进行工作的时候, 我们可以抽象出加减乘除等操作, 分别对应操作不同位数数据的汇编指令; 可以抽象出各种变量, 直接对应内存中的一段空间.

比如: 如果只是以两数相加举例的话, 对于C而言, 无论哪个平台的加法都可以通过 a + b a + b 来完成, 但是对于 IBM IBM 兼容机型的 x86_64 x86_64 架构 intel intel 语法宏汇编 (好长的定语) 而言, 则可能是 ADD AH, BH ADD AH, BH , ADD AX, BX ADD AX, BX , ADD EAX, EBX ADD EAX, EBX , 乃至于 ADD RAX, RBX ADD RAX, RBX 这里甚至只是考虑到只有两个通用寄存器参与运算的情况, 如果还有内存, 还要复杂的多. (其实如果用 AT&T AT&T 语法还能更复杂些, 毕竟 AT&T AT&T 还要考虑指令名的问题).

这就为程序的移植提供了极大的方便, 不再需要手动为不同的平台进行适配.

2.1.2.5.1.3.1 Mid-Level Language

C语言虽然名义上是一个高级语言, 但是很多人并不这么认为, 因为C语言并不提供一种通用的内存管理方案. 所有的内存都需要由程序员自己来手动管理. 这为系统编程提供了便利, 但也造成了不少内存泄漏等问题. 依旧需要考虑与低级语言汇编相似的边界问题.

因此, 便有人将C语言称作中级语言, 过渡语言. 不过, 这不过是称呼上的差别而已.

2.1.2.5.1.3.2 Compile & Interpret

CPU 实际上只能够理解和运行二进制的机器码. 因此, 直接以人类可读形式写出来的代码, 计算机没有办法直接执行. 这就需要对代码进行 编译 编译 , 或者 解释 解释 .

源代码 编译 汇编文件 汇编 目标二进制 链接 目标可执行
  1. 编译, 是将代码编译到汇编语言 (或其他语言), 再通过汇编器生成对应二进制代码, 最后链接, 产生原生可执行程序 (该可执行程序会最终包含操作系统需要的结构) 的一种过程.
源代码 解释器 输出
  1. 解释, 则是不经过编译过程, 通过虚拟机, 或者解释器, 随读入源文件执行代码的过程.

实际上, 对于现代语言, 编译型语言和解释型语言的区别并没有特别大. 比如, Java Java 语言就既需要编译到 JVM bytecode JVM bytecode , 也需要用 JVM JVM 解释字节码运行.

而我们, 会因为一门语言更倾向于如何运行, 来说这个语言是编译型语言, 或解释型语言. 比如, C语言, 就是一门会要求编译, 再运行的语言, 因此, 我们认为, C语言, 是一门编译型语言. 再如, 大家或许熟悉的 Python语言, 便是通过解释器执行的, 因此才认为 python语言 是一门编译型语言.

2.1.2.5.1.4 Environment And IDE

不知道大家是否喜欢玩 PC 上的游戏, 有时候玩游戏会提示缺少 DirectX DirectX 运行时环境, 编程也和玩游戏一样, 是需要环境的. 一般而言, 我们将这种专门用于开发程序的环境, 称作开发环境. 而将所有开发所需要的工具和开发环境本身, 一起打包, 并预先配置的软件系统, 就称作集成式开发环境(IDE).

在 Windows 平台上, 最常用的C语言 IDE 是 Microsoft (C) Visual Studio, 不过这个 IDE 以及它配套的编程环境, 都是为了 C++ 和 C# 而量身设计的, 并不太适用于 C 语言, 而它强制要求的工程管理, 以及提供的过多功能, 也容易导致初学者眼花缭乱, 忽视C语言学习的核心.

而 MacOS 平台上, 苹果公司提供了 Xcode IDE, 不过除了不得不写 Swift, 也几乎没有人使用它.

Linux 平台, 最常用的 "IDE" 是 (Neo)Vim 和 Emacs, 不过, 并不适合所有人使用.

鉴于平台相对不易统一, 而以上三个平台, 均提供了相对简单的方式以 LLVM-Clang LLVM-Clang 编译器作为 C语言 的编程环境, 在此处, 我们将采用手动配置环境的方式, 来作为学习C语言的第一步. 这也是大多数教程, 机构, 学校, 并不会教授, 而对于后续编程学习至关重要的一个部分.

另两个个人认为相对重要的部分是工具的使用和工具与知识的区别, 分别可以在 "计算机教育中缺失的一课 (The Missing Semester of Your Computer Science Education)" 和 "理论计算机导论 (Introduction to Theoretical Computer Science)" 中找到.

2.1.2.5.1.4.1 Environment Variables

环境变量可以被视为程序的设置, 它们告诉程序该如何工作, 比如, 配置 "PATH" 可以帮助程序找到需要的文件或者指令.

简单的理解, 对于程序而言, 这就是字典的索引, 当我试图索引一些信息的时候, 可以先去目录找到 "键", 然后根据 "键" 取得 "值".

而这些组合, 可以控制程序的行动. 目前需要了解, 并且对于今后都非常重要的一些环境变量分别是:

  • PATH PATH : PATH 变量就像是指示牌, 告诉了系统到哪些地方找到你输入的指令
  • 例如: 当你希望去通过 gcc 来编译程序的时候, 系统就会到 path 指定的文件夹中, 查找 gcc 程序. 如果没有办法找到, 就会报错.
  • 当我们在控制台(命令行) 输入一些指令, 并试图执行它们的时候, 操作系统就会通过 Path 环境变量搜索, 如果可以找到, 就执行对应找到的指令, 如果没有, 则会报错.
  • 当然, 不只是我们自己执行指令的时候需要用到Path, 很多其他的程序也会通过 PATH 来找到它需要的程序. 比如动态链接器 ( ld-linux-x86_64.so ld-linux-x86_64.so )
  • 好吧其实目前只用知道 PATH 一个就够了 (
2.1.2.5.1.4.2 Windows

对于 Windows 而言, 环境变量的修改非常便捷安全:

打开 文件资源管理器 (Explorer), 右键点选 "此电脑", 并在弹出菜单中选择 "属性" - "高级系统设置" - "高级" - "环境变量" 即可看见环境变量的配置窗口.

如果需要编辑任何之一, 只需要双击点选项目, 就可以看见对应修改界面了.

那么, 如果需要手动安装C语言的开发环境, 就需要先下载对应编译器, 然后将编译器本身所在的路径通过以上的方式加入PATH环境变量中. 不过, 相对于其他方式来说, 这种方式不仅不方便, 当需要更新开发环境的时候, 也会非常麻烦.

当然, windows也有更简单的方法去安装 C语言 的编程环境, 那就是通过 WSL.

WSL的全称是 "Windows Subsystem for Linux", 是微软创造出来, 用于提升开发者体验的一个工具. 凭借WSL, 我们可以非常容易的, 像直接使用Linux一样的安装和管理开发环境.

2.1.2.5.1.4.3 Linux, MacOS & *nix

对于类Unix及Unix系统而言, 环境变量的修改往往和用户配置文件相关联. 不过, 实际上, 要在这类系统上安装 C 的编程环境, 完全不需要对环境变量做过多修改, 而可以简单通过几行命令完成.

2.1.2.5.1.5 Hello, World

于是便到了我们的第一个程序: Hello, World!

这是一个来自于 C程序设计语言 (the C Programming Language) 中的例子, 同时, 它也陪伴了一代又一代新生的程序员. 带着我们对自己创造的新世界的欢呼.

"Hello World" 是程序设计中的经典入门例子. 它简单的向屏幕输出一句话, 帮助你了解代码的基本结构和运行流程. 学会了如何编写和运行 "Hello World", 你就可以开始学习更加复杂的程序啦.

#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}
#include <stdio.h>

int main(void) {
  printf("Hello, World!\n");
  return 0;
}

大家可以用任何笔记本将这段代码写下, 将它保存 (不要放桌面) 为 hello.c hello.c .

然后, 我们就可以开始进行编译了:

  1. Open a terminal,
  2. Enter dir dir : cd ${pwd} cd ${pwd} , where ${pwd} ${pwd} is the directory your file placed in,
  3. check if there exists file hello.c hello.c , type cat hello.c cat hello.c and press enter enter . Just after the command has been inserted, the content of whole file will be displayed. If the content printed in screen does not match the contents showing in your text input area, then you have not save the file properly. For example, the command will response with:

    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }
    #include <stdio.h>
    
    int main(void) {
      printf("Hello, World");
      return 0;
    }

    in my computer with my code shown above.

  4. 最后, 输入 clang hello.c -o hello clang hello.c -o hello , and it will give no information if there are no syntax error or other problems.

然后我们就会获得一个名为hello的文件 ( hello hello 是文件名, .exe .exe 叫拓展名). (you may find it at the file explorer). 这就是我们的目标可执行文件了!

Finally, 大家可以在终端中输入 ./hello ./hello 来执行它. 这样, 就可以看到它执行以后的结果啦:

Hello, World!
Hello, World!

这样, 你就完成了c程序的基本组成, 下面, 我们将依次简单的介绍, 它们都代表了什么含义. 这样, 你就可以自己尝试, 修改这个程序的内容, 写出独属于自己的 "Hello World".

Try to change the source code and you may let it print your name.

2.1.2.5.1.5.1 Explanation

Looks fantastic?

Here let us explain the structure of our current program.

The c program always composed in similar order. For example, we always have the three parts – header file import, entry, and expression.

我们的 "Hello, World" 程序, 包含了几个部分, 库文件的引入, 入口函数(main), 以及主要的表达式.

2.1.2.5.1.5.2 Library

C语言的内核很小, 只包括了一些非常基础的功能, 而其他的部分则都通过库来提供. 同时又因为它相对比较简陋, 所以当我们使用它的库的时候需要一个描述文件, 这个文件就可以告诉编译器, 这个库提供了哪些功能.

比如说, 这段程序, 首先是一串以 '#' 号开头的文本, 这句话表示, 我们引入了一个名叫stdio的库的定义.

'#' 号, 实际上代表了 "预处理指令" 的开始, 这里的预处理指令就是 "include". Include指令常常被用来包含一个文件, 比如说这里, 就包含了 stdio.h 这个文件.

Stdio, 是 "Standard Input / Output" 的简称, 它定义了常用的输入和输出函数, 它也将会成为后续C语言程序设计中最常用的库.

那么include指令是怎么样确定它需要包含哪些文件的呢? 实际上这取决于他需要包含的文件通过什么包裹. 比如在这里, 我们就使用尖括号 ('<' 和 '>') 包裹了 stdio.h, 它表示编译器会从系统路径中查找, 如果找到这个文件, 就将这个文件完整展开在指令处. 而如果我们通过双引号 ('“') 包裹了 stdio.h, 编译器就会先尝试从当前目录查找文件了.

大家可以尝试, 在 hello.c hello.c 同目录, 创建一个 stdio.h stdio.h 文件, 再重新编译一下这个程序, 看看是否会有区别.

如果将尖括号改成双引号呢? 比如我们下面会说到的 printf printf "函数", 就是由stdio.h文件告知编译器的.

那么什么是函数呢… 先卖个关子, 后面会对函数有详细的解释.

下面就是我们程序的主体了.

2.1.2.5.1.5.3 main
int main(void) {
  // ...
}
int main(void) {
  // ...
}

这部分, 就是我们的程序开始执行的部分. 如果没有它, 我们的程序就没有办法执行.

大家可以试一试, 如果不写这些部分, 只写下中间的 printf("Hello, World!\n"); printf("Hello, World!\n"); 会出现什么情况? 当然, 当我们按下运行按钮的时候, 它会告知, 这段程序并不 "合法". 当然, 这不是在说我们做了违法的事情, 而是这样的程序, 不合C语言的语法.

同时, 如果看到 Visual Studio Code 底部的 "PROBLES" 面板, 也可以看到, 它告知我们, 这个文件, 有许多的问题. 我们将它告知的信息称之为, 错误信息, 或报错.

我们将这个部分称作 "主函数定义". 而这个main, 就是主函数了.

它基本可以被认为是固定格式 (固定格式一共有四种, 托管环境三种, 非托管环境一种, 但是目前只需要会这一种即可).

printf("Hello, World");
printf("Hello, World");

则是我们程序唯一的主体 — 我们的程序实际上只干了这一件事 — 输出 "Hello, World".

2.1.2.5.1.5.4 Function

刚才的两个部分, 我们都提到了一个概念 – "函数", 函数是什么呢, 函数实际上是一系列代码, 一系列功能的集合, 通过定义函数, 我们可以将一些不同的操作组合在一起. 方便了程序的开发. 同样的, 也可以把这样的函数提供给自己, 或者其他人使用.

比如我们用到的 printf printf 函数, 也比如我们定义的main函数.

和数学里的函数类似, 函数可以接受一些参数, 并且产生一些输出. 就像多元微积分里的向量函数,

𝑓(𝑥,𝑦,𝑧):3

就可以接受x,y,z这样的参数, 并且将它们经过一系列的变换, 让它们变成一个普通的一维值.

这里的 printf printf 和它之后的圆括号的组合, 我们将其称作函数调用. 其实也和数学中的函数, 含义一致.

Printf(...) Printf(...) 的作用是, 将文本按照一定格式打印到屏幕上, "Print (with) format", 就是这个意思啦.

而这里的 "Hello, World" "Hello, World" 就是函数调用的参数, 它告诉 printf printf 函数, 要将什么东西给输出到屏幕.

不过这里只是简单介绍它的作用哦, 实际上 printf printf 函数的作用远不止这样简单的! 我们后续会有章节单独介绍它的功能.

return 0;
return 0;

这一句, 用于终止这个函数: "main". 当编译器看见这一句话, 就知道要结束这个函数的执行了… "返回".

这其实也涉及到了一些后面的知识, 所以目前记住主函数的结束, 必须写上这样一句 return 0; return 0; 就可以了.

2.1.2.5.1.5.5 Expression: Statement.

大家如果仔细观察了, 就会发现, main函数内部的两个东西, 结尾都是分号.

其实, 分号 (';'), 表示一个语句的结尾. What is statement, statements are base unit of c programming language. Every c program are make up with statements For example, our simplest program is:

int main(){}
int main(){}

here, it contains just a function definition statement. But after all, every c program must have at least one statement.

Statements are colourful, but, the rule for them are relative same. 除了一些特殊情况, C语言中写下的所有代码, 结尾都是有分号的.

语句大致可以被分为五种:

  1. 表达式语句
  2. 函数调用
  3. 流程控制语句
  4. 复合表达式
  5. 空语句

将会在后面详细讲解各个语句, 不过, 一定要记住, 每个语句的结尾都需要一个分号;

2.1.2.5.1.6 Types

C 语言是一门静态类型语言. 那么, 这一句话就涉及到两个新知识点了!

  • 什么是类型,
  • 什么是静态类型?

作为一门计算机语言, C语言操作的实际上都是一些数值. 对于不同的数值, 我们会人为规定它是什么 "类型".

比如, 我们就将大小在 2147483648(231)2147483647(2311) 之间的整数视为 "整型数 (Integer)". 而同时, 我们也需要表示一些文本, 所以就有了所谓的 "字符(Character)" 类型和 "字符串([Character] String)" 类型.

不过为什么需要将不同类型区别开来呢? 很明显, 字符串是没有办法当作整数来处理的对吧! (除非你把它们当作范畴论范围上面的幺半群来看… 当然这样也只能统一操作而没有办法让字符串和数字相加哦~)

那么静态类型是什么呢?

就像数学并不完全是数字的操作, 大部分时候也和未知数相关一样, 计算机程序也有自己的 "未知数" 需要操作. 当我们需要计算一些东西的时候, 很多时候都需要一个叫做 "变量" 的东西存储中间结果. 这个 "变量" 既然需要存储数据, 那么它就也需要一个类型. 毕竟, 不同类型的数据, 就上上面刚刚说明的, 有着不同的属性, 完全没有办法用同样的方式存储.

而 C语言 更进一步, 为了避免变量在多次赋值以后, 类型会不清, 干脆让我们在定义变量的时候就固定它可以承载的数据类型了. (实际原因当然不是这样啦, 实际上 C语言 必须有类型的信息, 才能为变量分配空间, 而不同的类型一般而言需要的空间不同, 自然不可以混用, 后续将在 "内存模型" 部分详细解说喵~ >w<) 这就是我们说的 "静态类型" 系统.

2.1.2.5.1.6.1 Literal

字面量, 就像我们在解数学题目的时候, 会写下一些系数, 一些常量, 字面量就是直接出现在程序当中的常量.

不过和常量有一些区别的是, 字面量是真正没有办法被改变的. 而计算机程序中的常量, 则仅仅只是表示一个变量不会被改变而已… 通过一些特殊的手段, 我们也是可以让一个常量打开心扉, 接受新的数值的.

2.1.2.5.1.6.2 Basic Data Types

对于简单的编程任务, C语言定义了一些基本数据类型. 它们涵盖了数字, 文本和逻辑(好吧其实并没有).

2.1.2.5.1.6.2.1 Integer

我们最常用, 并且也将最先介绍的就是整数家族了:

  • short short : 短整型, 相对于整型, 需要的内存更少, 只有16位空间 但是相应的,可以表示的数值也越少.
  • int int : 整型, C语言中默认的数据类型, 一般为32位空间, 也就是可以有31位二进制可以用于表示数据, 上述的 21474836482147483647 便是它可以表示数据的范围
  • long long : 长整型, 相对于 int int , 可能更长, 一般在处理大数据的时候才会用到
  • long long long long : 真长整型, 确定的64位数据.

每当我们在代码里面写下一个整数, 它就会自然具有上述类型之一的信息. 比如:

short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;
short s = 0;
int i = 65536;
long l = 2147483647;
long long ll = 2147483648ll;

注: 以上代码均写于 主函数 当中!

这里, 0, 65536, 2147483647 就都是 "int" 类型的 "字面量", 而 2147483648 就是一个 "long long" 类型的字面量了.

不过这些数字前面的类型和等于号都有些什么作用呢… 大家马上也会明白! 不过我们先来了解一下整数的变体们:

  • signed signed : 有符号前缀, 表示该类型是一个有符号的数据, 一般而言, 整型都是有符号的
  • unsigned unsigned : 有了上一条的提示, 当我们不需要表示数据的负数部分时, 当然就可以用无符号类型了, 当我们用无符号来修饰一个变量的时候, 它的表示范围就会从一半正一半负, 变成完全的正数哦, 相当于给 加上了一个的上标, 变成了, 不仅如此, 它正数部分的表示范围也会翻倍
  • 不过虽然被称作前缀, 它们其实也是可以 "单干" 的, 当只有前缀出现时, 实际上 C语言 (标准) 会自动给他补上一个 int 的.

这里可以再来几个例子:

signed int i = 2147483647;
unsigned int u = 2147482647u;
signed int i = 2147483647;
unsigned int u = 2147482647u;

Integer may be expressed as:

<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
<number>*<suffix>     for decimal express     ; 10, 11, 5
0<number>*<suffix>    for octal express       ; 0, 01, 077
0x<number>*<suffix>   for hexadecimal express ; 0x0, 0x1a, 0xff
0b<number>*<suffix>   for binary express      ; 0b1, 0b0, 0b10
2.1.2.5.1.6.2.2 Literal Suffix

有些同学可能就注意到了, 我们有些的数字之后, 跟上了一些字符. 这些字符, 比如 ll ll , ull ull , 被称作字面量后缀, 它的作用是, 给字面量一些修饰, 以方便编译器正确的处理这些数值.

那么, 大家注意到:

long long ll = 2147483648ll;
long long ll = 2147483648ll;

这一行, 大家可以尝试将这一段文本的字面量后缀 ll ll 去掉, 看一下, 会发生什么? 当我们尝试运行程序的时候, 程序报错了.

这是因为, 在C语言中, 我们写下的所有整数, 默认的类型都是int类型, 如果字面量超出了int类型的范围, 那就会出现错误.

2.1.2.5.1.6.2.3 Real numbers: float float & double double

在整数之外, 我们自然还有小数. 在 C语言 中, 我们将小数称之为 "二进制浮点数" 简称 "浮点数".

C语言中的常用浮点数一共有三种, 分别是:

  • float float : 默认浮点数, 一共占用32位字长, 不过相对于整数, 浮点数并没有精确的表示范围
  • double double : 双精度浮点数, 相对于 float float , 它的表示精度更高
  • long double long double : 双精度的升级版

不过为什么浮点数要叫做浮点数呢? 当然是因为它的小数点不是固定的啦.

不过, 也许还有人会疑惑, 什么叫做固定的小数点? 一般而言, 小数的位数不是无限的吗? 这当然还是因为计算机表示的局限性.

比如, 当我们需要表示金额的时候, 一般都可以写作 "XX元Y角Z分" 对不对, 那么当我们想要统一在 "元" 表示的时候, 就可以写作 "XX.YZ元" 了. 那么这里, 我们相当于是将所有单位统一到 "元", 而给 "角" 和 "分" 固定在了小数点后两位. 这就是所谓的 "定点数". 或者说, "100倍放缩的定点数".

那么, 有了 "定点数" 的前置理解, "浮点数" 或者 "动点数" (这是我瞎起的) 就好理解了. 因为定点数太过于固定, 只能适用于某些特殊场景. 所以就可以想到, 如果我们用一些方式, 记录住小数点的位置, 不就可以来表示任意形式的小数了吗. 于是, 浮点数就诞生了. 不过, 上面我们表示的 "定点数", 是以 10 为基底的十进制定点数, 而在计算机里, 我们使用二进制数来表示数据, 因此, 我们实际上使用的浮点数也是二进制表示的. 这就可以解释什么叫做 "二进制浮点数" 了.

2.1.2.5.1.6.2.4 Type Boost

当然, 在数学之中, 我们也有整数和小数的运算, 大家可以先试一下, 当我们在c语言之中, 进行了可以得到小数的运算之后, 会得到怎么样的结果?

printf("%d", 1 / 2);
printf("%d", 1 / 2);

结果是0, 是不是很奇怪?

因为, 在c语言中, 整数和整数之间的运算, 只会得到整数, 如果需要一个浮点数结果, 就必须让一个浮点数参与运算, 比如

printf("%f", 1 / 2.0);
printf("%f", 1 / 2.0);

这样, 就得到了0.5.

为什么会这样呢? 因为在 C语言中, 当一个运算涉及的类型不相同的时候, 会将表达范围较小的数据, 转换成为表达范围更大的一个数据, 再去参与运算. 我们将这种过程称作, 自动类型转换.

当这里的int类型的整数, 遇见了2.0这样一个float类型的浮点数, 实际上浮点数的表示范围大于整数, 所以, int就被提升到了float类型, 并且参与运算, 得到 1.0 / 2.0 = 0.5 了.

以下是自动类型转换的图表

small -------------------------------------------------------> -------------------------------------------------------> large
char, short, int unsigned int long long long float double long double

从左到右, 类型依次自动提升.

而从整数开始的类型转换, 被称作 "整型提升". 比如可以看到, char, short, int类型, 均为同样的自动类型转换阶段. 因为对于char, short, 和int类型, 都发生了相同了整型提升, 按照C语言的规则, 会将所有的表示范围小于int的类型, 均提升到int类型的大小来参与运算.

无论使用什么整数, 都可以在表达式中使用char, short int或 int字段(全部带符号或没有符号)或枚举类型的对象. 如果一个int可以代表原始类型的所有值, 则该值将转换为int; 否则, 该值将转换为unsigned int, 这个过程称为整体提升.

这从汇编的角度来看, 其实就是将寄存器由小寄存器, 拼接到相对大的寄存器. 如, 将 AH AH 寄存器, 提升到 EAX EAX 寄存器.

2.1.2.5.1.6.2.5 String & Char

另一部分, 在数值之外, 就是字符类型和字符串了.

我们在数学的学习中, 计算出的结果, 直接写在 "解" 字后面就可以, 这实际是一种得出结果的 "输出" 过程. 那么, 同为进行数学计算的计算机, 要如何组织它的输出呢? 当然就是靠字符串咯:

printf("This Is A String");
printf("This Is A String");

依旧是熟悉的 printf printf , 不同的是它需要操作的字符串.

字符串, 顾名思义, 是一串连续的字符序列, 一般我们用双引号括住的一串连续文本来表示一个字符串字面量.

那么字符该怎么样表示呢?

很简单, 除了双引号, 我们还有单引号呀. 理想情况下, 所有的单引号包括的单个字符都是一个字符. 不过, 因为有些字符完全没有办法用键盘打出来, 所以我们也提供了另外一些方式:

  • 'c' 'c' : 单引号包括字符
  • '\ooo' '\ooo' : 按8进制表示的字符
  • '\xhhh' '\xhhh' : 按16进制表示的字符

当然咯, 有些字符远超过了字符可以表示的长度(8位), 所以我们还有另一种字符类型: "长字符" 类型.

  • L'c' L'c' : 单引号包括的长字符
  • L'\ooo' L'\ooo' : 单引号包括的8进制表示长字符
  • L'\xhhhh' L'\xhhhh' : 单引号包括的16进制长字符

大家其实也可以看出来, 长字符字面量实际上就是给普通的字符字面量添加了一个"L"前缀罢了. 那么实际上, 我们也可以用同样的方式, 把一个普通的字符串字面量变成长字符串:

wprintf(L"Hello World");
wprintf(L"Hello World");

注: 实际上中文字符都会超过字符类型可以表示的范围, 但是为什么普通字符串可以表示含有中文的文本呢? 比如, printf("你好, 世界"); printf("你好, 世界"); . 因为字符串实际上不一定是一个字符变量表示一个字符, 现在看来可能会有些绕口, 但是当我们讲到字符串实际的表示方式的时候, 就会很好理解了.

所以也不是特别需要用长字符串来表示文本了.

对了, 不知道大家有没有注意到, 当我们描述整数类型的时候, 并没有说到8位整数, 对应着其他语言中很常见的 byte byte 类型? 这是因为, c语言用 char char 类型代替了8位整数, 所幸, c语言中并不是很常用到8位的数值, 因此这样的代替也并不是很大的问题. 当我们真的需要它的时候, 也可以临时用 char char 类型充当一下.

2.1.2.5.1.6.3 Logical Values

当然, 计算机也不总是只处理数值. 作为一堆二三极管, 逻辑门, 晶体管拼接而成的产物, 有有着天生的二进制表示, 二进制逻辑也是计算机程序处理的内容之一.

先从简单的入手, 逻辑一共有两种状态, 是, 或者否, 在 C语言 中, 我们用了一种很简单的方式来表示:

  • 数值为0: 否 ( false false ),
  • 否则: 是 ( true true ).

很简单对不对.

2.1.2.5.1.6.4 Void Type

以上的类型, 都还很具体, 不过当我们需要表示 "这里没有东西" 呢? 该怎么办?

这时候我们就需要用到 void void 类型了. 不过这里不解释太多, 我们将会在应用中见证它的使用.

2.1.2.5.1.7 Mathematics Operations

有了数字, 并不能让我们进行计算, 我们还需要定义对于这些数字的运算才可以.

所以首先, 对于所有的数值, 不管是整型数家族的, 还是浮点数家族的, 都适用于我们熟悉的四则运算, + + , - - , * * , '/'.

Operations Description Form Comment
+ + 两数相加, 并返回新的相加后的值 A + B A + B
- - 从前数中减去后数, 并返回新的相减后的值 A - B A - B
* * 两数相乘, 并返回新的乘积 A * B A * B
/ / 前数除以后数, 并返回除商 A / B A / B

当然了, 由于取余数的操作太有用了, 实际上 C语言 也为整数和浮点数的取余操作定义了两个方式, 并将这种运算称作 "取模":

Operations Description Form Comment
% % 取模 A % B A % B
fmod fmod 浮点数取模 fmod(A, B) fmod(A, B) 该方法为函数调用, 仅对 double double 类型浮点数生效
fmodf fmodf 浮点数取模 fmodf(A, B) fmodf(A, B) 该方法为函数调用, 对 float float 类型浮点数生效
fmodl fmodl 浮点数取模 fmodl(A, B) fmodl(A, B) 该方法为函数调用, 对 long double long double 类型浮点数生效

下面则是c语言中, 整型变量特有的四种运算符, 它们被称作 "自增/自减运算符"

Operations Description Form Comment
++ ++ 自增 A++ A++ 先将原始值返回, 再将变量值增加1
++ ++ 自增 ++A ++A 先将变量值增加1, 再返回增加后的值
-- -- 自减 A-- A-- 先将原始值返回, 再将变量的值减少1
-- -- 自减 --A --A 先将变量的值减少1, 再返回减少后的值

大家可以发现, 自增和自减运算符都是有一定的规律的, 如果运算符的位置在变量的前面, 那么就是先对变量进行操作, 然后再取值, 而如果运算符的位置在变量的后面, 则先取值, 等到值参与完运算以后再给变量自增或自减.

int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);
int i = 0;
printf("%d", i++); // => 0, i = 1;
printf("%d", ++i); // => 2, i = 2;
printf("%", i);
printf("%d", i--); // => 2, i = 1;
printf("%d", --i); // => 0, i = 0;
printf("%", i);

同样的, 大家也可以看到, 这里对于运算符的描述并不是对数值生效了, 而是对 "变量" 生效. 那么变量是什么东西呢? 正如之前已经提到过的, 变量是一种用来存储数值的东西, 那么既然变量可以存储数值, 并且也可以参与运算, 所以我们就也自然会有一些对于变量本身存储的数值进行操作的运算符, 除了这里讲到的自增自减运算符, 其实还有其他的, 比如赋值运算符.

2.1.2.5.1.7.1 Relation Operations

除了数值运算, 实际上我们也可以对这些数值进行比较, 在 C语言中, 这些用来比较不同数值之间大小关系的运算符, 被称作 "关系运算符".

关系运算符对于所有的数值都生效, 而对于字符串, 由于字符串的比较也非常常用, 因此, 字符串比较的函数也是被纳入到了标准函数库中. 不知道大家是否还记得前面提到的, 什么是 "库". 库, 就是一种由其他人写出来, 而不是由C语言本身提供, 定义了一系列有用的函数以供导入的东西.

好吧, 扯远了, 一下就是所有常用的关系运算符 (和函数):

Operations Description Form Comment
== == 相等关系 A==B A==B 若A等于B, 则返回1
!= != 不等关系 A!=B A!=B 若A不等于B, 则返回1
> > 大于关系 A>B A>B 若A大于B, 则返回1
< < 小于关系 A<B A<B 若A小于B, 则返回1
>= >= 大于等于 A>=B A>=B 若A大于等于B, 则返回1
<= <= 小于等于 A<=B A<=B 若A小于等于B, 则返回1
strcmp strcmp 字符串比较 strcmp(A, B) strcmp(A, B) 若两字符串相等, 返回0, 否则返回按字典序相减值
memcmp memcmp 内存比较 memcmp(A, B) memcmp(A, B) 返回两内存空间相减二进制值

不过, 必须要注意的一点是, C语言中不存在连续不等式, 也就是说, C语言中是没有办法写出类似 𝐴>𝐵>𝐶 的这种表达式的.

那么, 如果真的不小心写出了这样的代码, 会发生什么事情呢? 比如说 1 < a < 10 1 < a < 10 .

实际上, 这种表达式会被C语言认为是一种连续运算的表达式. 也就是, 前面一个表达式运算完成, 然后再让结果参与下一个表达式的运算, 而这种连续运算, 是存在优先级关系的, 就像数学中, 同时包含加减和乘除的算式中, 永远都是乘除先参与运算一样.

那么, 对于上面的表达式, 就是先进行 1 < a 1 < a 的运算, 再把结果, 不论是1, 或是0, 交给后面与10的比较. 这样就会导致, 这个表达式的结果, 一定只是1.

因此, 一定要注意, 不要写出 "连续不等式" 哦.

2.1.2.5.1.7.2 Logical Operations

逻辑运算, 也是C语言经常需要进行的运算, 那么什么是逻辑运算呢?

实际上, 逻辑运算就是能够把多个逻辑值串成一串, 确定最后到底结果是真是假的运算.

就比如, 刚刚才提到的, C语言中并没有连续不等式, 那么该怎么样表示连续不等关系呢? 这里就需要用到逻辑运算了.

逻辑运算主要包含了, 或, 与, 非, 三种运算:

Operations Description Form Comment
&& && 逻辑与 A&&B A&&B 若A和B都非0, 则返回1
|| || 逻辑或 A||B A||B 若A和B有至少一个非0, 则返回1
! ! 逻辑非 !A !A 若为0, 则返回1; 若非0, 则返回0

从这里, 也可以看出来, 逻辑与或非和逻辑门运算还是非常不同的. 所以后面, 将会单独对按位逻辑运算进行详细介绍…

回到如何表示连续不等关系, 只要这样写即可

1 < a && a < 10
1 < a && a < 10

值得注意的是, 逻辑运算符, 都是 "短路" 的. 这是什么意思呢? 就是说, 如果逻辑运算符的左边结果, 已经可以决定逻辑运算符整体结果, 那么逻辑运算的右半部分就不会被执行, 而是直接将逻辑运算的结果返回出来.

2.1.2.5.1.7.3 Associativity

正如上面提到的, 运算符结合性决定了连续运算的表达式的执行顺序, 那么, 具体的规则如何呢?

在下表中, 自上而下, 与对应操作相关的表达式被更先进行, 由左而右, 结合性依次减小

Operations Description Comment
() [] -> . ++ -- () [] -> . ++ -- 后缀 从左到右
+ - ! ~ ++ - - (type)* & sizeof + - ! ~ ++ - - (type)* & sizeof 一元 从右到左
~ ~ 按位取反 从左到右
* / % * / % 乘除 从左到右
+ - + - 加减 从左到右
<< >> << >> 移位 从左到右
< > <= >= < > <= >= 比较关系 从左到右
== != == != 相等关系 从左到右
& & 按位与 从左到右
^ ^ 按位异或 从左到右
| | 按位或 从左到右
&& && 逻辑与 从左到右
|| || 逻辑或 从左到右
? : ? : 三目运算 从右到左
= += -= *= /= %= >>= <<= &= ^= |= = += -= *= /= %= >>= <<= &= ^= |= 赋值 从右到左
, , 逗号 从左到右

很复杂对不对, 但是没有关系, 其实, 当你不确定运算符优先级究竟是如何的, 可以直接将自己希望的运算顺序用括号括出来, 表示它们需要优先进行. 其他的部分, 也是非常符合数学中的直观感受的.

大家也许会发现, 除了我们已经讲过的一些基本数值运算, 这张表中还有一些从未见过的其他运算符,

仔细观察的话, 除了逻辑与和逻辑或, 在这张表中还有按位与或, 异或, 和取反. 很快, 我们将开始了解它们.

PS. 另一个比较重要的则是赋值运算符家族, 将在重新完整介绍完C语言的语法后介绍.

2.1.2.5.1.7.4 Binary Calculation

现在, 就需要一些简单的数学了: 二进制运算.

首先, 什么是二进制运算呢, 实际上, 二进制运算是针对二进制数的运算, 虽然这话听起来好像是废话, 但是它实际上 也是废话 却有很多含义.

首先, 它表示了它操作的对象是二进制数, 也就是运算规则为逢二进一的数.

二进制的基数为2, 每一位的数字, 只可能是0或1.

二进制数有一些特别的特性, 其中最显著的优势在于, 它的每一位只有两种状态, 这正好和电路的开关相一致. 这样就方便了计算机的工作. 另外一些特性是, 二进制数可以方便的和十六进制与八进制相互转换, 虽然这些实际上是十六进制和八进制的优势, 因为它们基数均为二的次方.

2.1.2.5.1.7.5 Radix Convert

二进制对于计算机友好, 但是对于人类来说却有些难办了. 因为我们常年都在和十进制打交道.

那么这就需要处理各种 "进制转换" 问题.

二进制和十进制, 同样都表示了同样的数集中的数, 因此它们可以以一定规则互相转换.

二进制转换为十进制, 实际上就是依照每一位, 乘以对应的二的次方. 也许听起来会有些复杂, 但是操作起来非常简单: 如: 我们有二进制数 1011, 那么它的十进制就是:

(1011)(2)=1×23+0×22+1×21+1×20=(11)(10)

二进制转换为十进制也是类似的, 就是不断将十进制数除二取余数即可:

112=5152=2122=1012=01

最后将余数从下向上写出即可得到对应二进制数.

上文提到, 二进制和十六进制, 八进制的互相转换非常方便, 那么, 它具体方便到什么程度呢? 对于二进制转十六进制, 只要按四位一组, 高位不足补0, 直接换成十六进制就行. 八进制也类似, 按三位一组, 高位不足补0, 替换成为八进制.

继续以 1011 举例:

(1011)(2)=(𝐵)(16),(1011)(2)=(001011)(2)=(13)(8).

反向操作也极其一致, 非常方便.

2.1.2.5.1.7.6 Bitwise Operations

二进制, 除了常规的十进制运算, 其实也提供了一些特别的运算能力, 在C语言中的表现就是, 按位运算.

在计算机中, 门电路一种可以提供 与门(AND), 或门(OR), 非门(NOT), 与非门(NAND), 或非门(NOR), 异或门(XOR), 同或门(XNOR), 这几种逻辑门.

它们的运算逻辑可以以下表表示:

Operations Description Form A B Result
AND AND A AND B A AND B 1010 1100 1000
OR OR A OR B A OR B 1010 1100 1110
XOR XOR 异或 A XOR B A XOR B 1010 1100 0110
NAND NAND 与非 A NAND B A NAND B 1010 1100 0111
NOR NOR 或非 A NOR B A NOR B 1010 1100 0001
XNOR XNOR 同或 A XNOR B A XNOR B 1010 1100 1001
NOT NOT NOT A NOT A 1010 - 0101

实际上, 它们的规则也非常简单:

  • 与门当且仅当两个输入均为1时才输出1, 否则输出0;
  • 或门只要有一个输入为1就输出1, 否则输出0;
  • 非门将输入取反, 原输入为1, 输出0, 否则输出1;
  • 与非门实际上是与门取反, 只在输入不存在, 或有一个1的时候才输出1, 否则0;
  • 或非门则是或门取反, 当均为0时才输出1, 否则输出0;
  • 异或门的重点在于 "异", 当两个输入相反时, 输出1, 否则输出0;
  • 同或则是异或取反, 当输入均相同时, 输出1, 否则输出0.

因此, 实际上, 一切包含非的门电路, 均可以来自于与, 或, 取反, 而其他所有门电路, 则均可以通过NAND门取得.

计算机底层的实现中, 有逻辑门运算, 而C语言中, 也有对应的按位运算. 按位运算是门运算对于多位二进制数的运算, 一共有四种:

Operations Description Form Comment
& & 按位与 A&B A&B 若A和B对应位都非0, 则对应位置1
| | 按位或 A|B A|B 若A和B对应位有至少一个非0, 则对应位置1
^ ^ 按位异或 A^B A^B 若A和B对应位有且仅有一个非0, 则对应位置1; 否则, 则对应位置0; 不同为1, 相同为0
~ ~ 按位取反 ~A ~A 每一位若为0, 则置1; 若非0, 则置0
2.1.2.5.1.7.7 Overflow

计算机操作的虽然是二进制数, 但是它的容量却是有限的, 而不能像数学中可以表示理想的无限大整数.

因此, 当数的大小超出了计算机可以表示的范围, 就发生了 "溢出". 在大多数的计算机中, 当发生了溢出, 溢出位会被抛弃, 而只给出一个是否曾发生了溢出的标记.

绝大多数时候, 我们会选择尽可能的避免溢出的发生, 因为它会导致运算结果不符合预期. 因此, 当定义变量的时候, 需要提前估算数据的范围, 为不同的数据选用不同的类型.

但是溢出并不总是坏事, 有时候, 它可以给我们带来一些特殊的优势. 比如著名的 "雷神之锤 III" 平方根倒数速算法, 就为是利用了溢出和微积分线性拟合的典例.

而我们计算机中, 对于负数的表示, 也和溢出有千丝万缕的联系.

2.1.2.5.1.7.8 2's Completion

计算机可以表示的数据是有限的, 最开始, 一块 CPU 只能计算8位二进制数, 那非常小, 只能表示 0255 之间的数据. 后来, 直到现在, 计算机也只能表示64位的数据. 当我们只考虑正数的时候, 它并不会出现很大的问题, 在整数范围内, 直接相加即可得到所需的结果. 即便是两数相加发生溢出了, 也可以相对简单的解决.

但是, 当需要考虑负数的时候, 情况就开始不一样起来了. 我们开始必须找到一种方式, 来区分一个数是正数还是负数.

最朴素的想法是, 我们舍弃一位的表示范围, 将这一位用于区分数的正负性. 于是, 我们就有了 "整数的原码表示" (Origin).

在我们需要表示的数值为正时, 原码与真值 (True Value) 相同. 而当需要表示负数的时候, 最高位会被写作1. 也就是说, 将最高位作为符号位, 记录数据是正还是负.

原码表示在数学运算中会导致非常大的问题, 因为, 负数参与运算时, 最高位为1, 与正数进行二进制加法, 可能会得到不正确的结果 — 一个更大的负数.

    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)
    0000'0000 0000'0111   (+7)
  + 1000'0000 0000'0111   (-7)
 -----------------------
    1000'0000 0000'1110   (-14)

所以, 对于一个涉及到负数的运算, 不能直接采用通常的二进制原码表示, 简单的将负数的最高位置为1.

理想的负数表示, 需要保证运算完成后, 可以使得负数与对应正数相加值位0 (最高位产生1位溢出).

于是, 为了达成这样的结果, 我们选择将数值部分原样取反 这样就得到了 "反码" (1's Completion).

但是反码有同样的问题, 虽然可以避免正负数相加得到更大的负数, 但是一个正数, 和对应的负数相加, 得到的却不是原始的0, 而是全1, 这就会造成 +00 的问题.

    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)
    0000'0000 0000'0111   (+7)
  + 1111'1111 1111'1000   (-7)
 -----------------------
    1111'1111 1111'1111   (-0xffff)

于是, 既然相等负数相加不为0, 那么干脆给它补一个1, 将反码运算中的结果加上一个1, 再经过溢出处理, 最后的结果就是我们想要的真正的0.

为了实用, 将这个1, 加入到反码表示中. 于是, 我们就得到了 "补码" (2's Completion).

当然, 这是实践可以得出的结论, 补码实际上有它更深层次的意义.

2.1.2.5.1.7.9 N's Completion

N的补码, 实际上是模N剩余类加群, 对于

𝑍𝑛=𝑍mod𝑛(𝑍,mod)

, 满足封闭性, 结合性, 则有Z上的模N剩余群.

给定一个n, 有n个模n剩余类, 且有 a, b 满足 gcd(𝑛,𝑎)=1,𝑎×𝑟𝑖+𝑏, 构成模n完全剩余系.

对于𝑛𝑛, 有𝑏=𝑛𝑎𝑎+𝑏=0, 若定义 𝑎𝑛1, 存在负数与对应正数模n同余, 则n为互补常量.

𝑎=𝑎的加法逆元, 则, 对 𝑀 求补有 𝑎=𝑀𝑎,𝑀=10𝑛, 对于 M M 0=𝑀,0=0, 在 𝑀2 上同余.

2.1.2.5.1.7.10 Bitwise Shift

Apart from regular bitwise operations, we have some special ones as well. Could you image that every digit of a numbers can be shift?

We have mentioned float point numbers before already, right? You may think that float point can be seen as shift of digits. But actually, the float point numbers just move the position of decimal point.

In bitwise shift operations, the decimal point will be fixed in #0. #0. . And, move all digits directly right or left.

  • Logical Shift Right: Shift all digits right based on 0 position. Every number outside 0 will be discarded. Padding higher position with 0.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
  • Mathematical Shift Right: Mostly same as logical shift right operation, but padding higher position based on sign bit.

     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...
     | 0000'1001 0010'1111 | =>
    0 | 0000'1001 0010'111 | 1 =>
    00 | 0000'1001 0010'11 | 11 =>
    ...

    For positive numbers, exactly like logical ones.

     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...
     | 1111'1001 0010'1111 | =>
    1 | 1111'1001 0010'111 | 1 =>
    11 | 1111'1001 0010'11 | 11 =>
    ...

    For negative ones, padding number will be 1 instead.

  • Shift Left: Shift all digits left based on highest position. Every number over highest limit will be discarded. Padding 0 position with 0.

       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
       <= | 0000'1001 0010'1111 |
     <= 0 | 000'1001 0010'111 | 0
    <= 00 | 00'1001 0010'11 | 00
    ...
Operations Description Form Comment
<< << SHL A << B A << B
>> >> SHR A >> B A >> B Different machine may choose different SHR method, Logical or mathematical

Give a brief knowledge of bitwise shift operations here. You may find that, shift operations just do multiplication and division indeed.

How?

Actually, SHL SHL are some number multiple 2𝑛. SHR SHR are some number division 2𝑛.

And all discarded numbers are seen as overflow.

2.1.2.5.1.8 Syntax

C语言, 实际上, 作为一种和计算机进行沟通交流的语言, 实际上也有自己的一套语法规范.

在前面几节中, 我们也看到了, 如果没有按照它的语法规范来书写, 就会遇见 "非法" 报错.

因此, 我们有必要系统了解一下C语言的各种语法规范.

以下是我们的示例程序:

/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}
/// file: main.c

// main function, the entry
int main(int argc, char* argv[], char* envp) {
  int integer_value;
  float float_value = 1.0;

  printf("Hello, World!\n" /* comment can appear any where */);
  integer_value = 10;

  printf("Calculate a + b: %d + %f = %f", integer_value, float_value, float_value + integer_value);
  return 0;
}

/* foo function, void parameter and empty body */
void foo(void) {
  // do sth.
}

From the program above, we can see that there are several lines that contains something we haven't met before.

We all explain them all in this chapter.

2.1.2.5.1.8.1 Statements

The first thing I'd like to tell you is definition for statement.

The c program are composed with statements, just as what we have mentioned before.

Statements define the operation the program will execute. Each statement may have do something.

According to the C Programming Language Standard, every statement in c need to end with semi-colon (';'). Unless it is listed detailed that has no necessary to have semi-colon.

For example, we can see,

  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;
  int integer_value;
  float float_value = 1.0;
  printf("Hello, World!\n");
  integer_value = 10;

they all statements.

Also, multiple statements can be written in same line. You may see this:

int i; i = 1;
int i; i = 1;

From here, we written two statements, int i; int i; , and i = 1; i = 1;

So, it is not necessary to add line feed between two different statements.

They are added for beauty and clear.

Also, because the statement termination will just be determined by semi-colon, one statement may be written in multiple lines.

int
i
=
10
;
int
i
=
10
;

They are legal as well.

But, we'll not write code in this way. More common usage of this feature will be:

int i = 10,
    j = 20;
int i = 10,
    j = 20;
2.1.2.5.1.8.2 Expression

As we have known statement, another import part of c program is expression.

From which, a expression is some form that contains different operation.

Most basic expression we'd used in program are calculation.

1 + 2
i = 0
printf("Hello, World")
1 + 2
i = 0
printf("Hello, World")

They all expressions, and finally get the result of those operation.

Statements may contains expression, but expression cannot construct a statement.

Also, most of the time, a expression will generate some value, that can be used in the following program.

Furthermore, expression is able to be nested.

printf("%d", 1+1)
printf("%d", 1+1)

Here, we have two expression, the smaller one 1+1 1+1 , and the larger one, which wraps the small one, printf("%d", ~) printf("%d", ~) .

Once we add semi-colon after them, the whole expression will be a statement.

printf("%d", 1+1);
printf("%d", 1+1);

And is ready to do something particular.

You may image, as the function call is a valid expression, and can be turned into statement. The calculations, we can also add semi-colon after them, to have a statement.

1;
8*2;
1;
8*2;

But they are meaningless.

2.1.2.5.1.8.3 Code Block

When we programming, sometimes we may want to execute some operation at same time (or intend to execute them at same time).

Then, we need Code Blocks, or "compounded statements". They are Statements composed and wrapped in one large brackets. For example:

{
  int x;
  x = 1;
}
{
  int x;
  x = 1;
}

They are seen as a group, one large statement later on the rest of program.

And we need no semi-colon at the end of bracket expression.

2.1.2.5.1.8.4 Empty Lines & Space

Not only for beauty, we'll need spaces in code for distinct different syntax object.

For example, why we always need a space between int int and i i ? Because if we dropped it, the compiler will only see inti inti , which is not a valid name, or anything else.

Just like the reason why we must write space between different words. (Even in Chinese).

So, at some particular times, if we can say that, the space will not change the structure of our code, the space is able to be deleted.

Empty lines, the line which contains no code, does relative same as space. If it is not necessarily placed there, then it does only for beauty, and can be removed.

The example here points out, when can we discard the space and empty lines.

int x = 1;
// Equals to
int x=1;
int x = 1;
// Equals to
int x=1;
2.1.2.5.1.8.5 Comment

Comments are another thing that will not affect anything within our code. When compiler meets a comment, it will ignore it directly. Which means, comment will behaviour like a space in our code.

There are two ways for us to write comments.

  • /* ... */ /* ... */ : multiple line comment, but also for inline comment, anything inside /* /* and */ */ will be ignored.
  • // ... // ... : one-line comment, anything follow after will be ignored.

We can see the code above, to have a relative simple understand to comments.

2.1.2.5.1.9 Variables & Variable space

Here, we comes to the most import part of a program. We'll know what variable is, how it is defined, and operations done on them.

First of all, we'd like to see, relation between variable and value.

2.1.2.5.1.9.1 Data, Variable, Value

Data, something that represents something, carrying some information, always the object we will manipulate in program.

But how can we describe a data? We may use something called "variable", they are some slot that has desired space for storing data.

Thus, in general, variable are some space, slot, that can store some value, carrying some specified data.

2.1.2.5.1.9.2 Definition

Before we use some concrete variable in our program. We must define them.

The basic forms of variable definition are list below:

<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];
<variable-type> <variable-name>;
[<decorator>] <variable-type> <variable-name> [= <literal-value>];

Also, we have another way to declare a variable:

extern <variable-type> <variable-name>;
extern <variable-type> <variable-name>;

From them all, we can see that, to declare a variable. We'd have to write in "type name;" form.

Where, type can be any type specifier mentioned above in types section.

Such that,

int a;
int b;
int a;
int b;

Furthermore, when we have learnt the structure, enumerator, union and function, we all have more form of types.

2.1.2.5.1.9.3 Variable Name

One must-have element of variable definition is type. And another one is variable name.

Once we have define a variable, we can then reference it using its name.

Just like you call one's name.

Variable names in c programming language must follow some rules:

  1. start with '$', '_' and alphabet,
  2. have no space inside,
  3. followed by '$', '_', alphabet, and numbers.
  4. has a total length less than 63 character.
  5. not duplicate with any other names defined before or same with keywords like 'int'.

Keywords, are some commands will reserve for special usage in c program, for example, int int , if if , continue continue . And C programming language also have some name reserved for further usage. So, for those name, although it is possible to be use, it is not encouraged to do so.

Here are some mainly used keywords and reserved names:

auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic
auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, inline, int, long, register, restrict, return, short, signed, sizeof, static, struct, switch, typedef, typeof, union, unsigned, void, volatile, while, _Generic

Outside those keywords that cannot use, we also have extra naming rules.

Names starts with two underscore ('_') and those start with one underscore and a capitalized alphabet are reserved for compiler.

Names starts with two underscore and ends with two underscore are reserved for system-wide standard library.

Names starts with one underscore and a lower-case alphabet, ends with one underscore are reserved for library.

Names all capitalized alphabet, split by underscore, meaning constants.

2.1.2.5.1.9.4 Initialize

Once you finished declaration, which doesn't means you finished the variable definition.

A variable must do initialize, and then can be put into use. Otherwise, you may get random value when you try to reference it.

First time assignment to a variable are called "initialization".

Only for that, with variable declaration and initialization, we can say we finished a variable definition.

From list above, we can see that initialization can be done together with declaration.

int a = 10;
int a = 10;
2.1.2.5.1.9.5 Assignment Operations

Assignment are some operation special to variable.

Most simple one has notation like equation equation in math. We call it assignment operation assignment operation directly.

Operations Description Form
= = Assignment A = val A = val

After program finish a assignment operation, it value store within variable will be replaced.

int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9
int i = 20;
printf("%d", i);
// => 20
i = 9;
printf("%d", i);
// => 9

So, this is the meaning of "variable", a space that can store some value. And assignment operation just find those space, and then replace the value inside. Just like the drawer that can store exactly one thing. You may put one thing inside. And you may clear the drawer, and put a new one inside.

2.1.2.5.1.9.6 Composed Assignment Operations

Beyond regular assignment operation, we have some advanced ones. You may compose assignment operation with other mathematics operations. Thus, we got compound assignment operation compound assignment operation .

Operations Description Form Equivalent Form
+= += Addition Assignment A += val A += val A = (typeof(A))(A + val) A = (typeof(A))(A + val)
-= -= Subtraction Assignment A -= val A -= val A = (typeof(A))(A - val) A = (typeof(A))(A - val)
*= *= Multiplication Assignment A *= val A *= val A = (typeof(A))(A * val) A = (typeof(A))(A * val)
/= /= Division Assignment A /= val A /= val A = (typeof(A))(A / val) A = (typeof(A))(A / val)
%= %= Modulus Assignment A %= val A %= val A = (typeof(A))(A % val) A = (typeof(A))(A % val)
^= ^= Bitwise XOR Assignment A ^= val A ^= val A = (typeof(A))(A ^ val) A = (typeof(A))(A ^ val)
|= |= Bitwise OR Assignment A |= val A |= val A = (typeof(A))(A | val) A = (typeof(A))(A | val)
&= &= Bitwise AND Assignment A &= val A &= val A = (typeof(A))(A & val) A = (typeof(A))(A & val)
<<= <<= SHL Assignment A <<= val A <<= val A = (typeof(A))(A << val) A = (typeof(A))(A << val)
>>= >>= SHR Assignment A >>= val A >>= val A = (typeof(A))(A >> val) A = (typeof(A))(A >> val)

Those self-increment operation and self-decrease operations are some kind of same as addition assignment and subtraction assignment:

int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
int a = 0;
a++;// a=>1
a+=1; // Equivalent, a => 2
--a;// a=>1
a-=1;// a => 0
2.1.2.5.1.10 Type Conversion

As we mentioned before, C is typed language. Each type's variable occupies different spaces.

So, to have one variable has type int int , to be used as long long , we must convert its value into type long. The way to archive this is called type convert.

In types section, we have learnt type boost type boost , this is a kind of special automatically type conversion. Auto type conversion always convert type from smaller ranges to larger. So, that's why we need force type conversion.

To convert a value's type from one to another, add type with brackets before the expression.

(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;
(int)10ll; // same as 10
(char)12;  // same as '\14'

char c;
int i = 3000;
c = (int)i;

But force type conversion has a serious problem: it may result in resolution lack. Conversion from int int to char char , is a kind of conversion from large range to smaller range. And it will simply discard higher part of int int value. Instead of the case short short convert to int int , just put all data into lower part of int and everything is OK.

For example,

  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011
  Short: 0010'0000 1000'0011 =>
  Char:  1000'0011
  Int:   0000'0000 0000'0000-0010'0000 1000'0011

This may cause some unexpected results.

Also, conversion from real numbers to integer will also introduce same problem. All number after decimal point will be dropped directly.

2.1.2.5.1.11 Input And Output

Programs does not only calculation, but also have to tell the result. Thus input and output utilities are indispensable.

Most useful input and output function are provided by printf printf and scanf scanf function in C.

2.1.2.5.1.11.1 printf printf

printf printf , stand for "print with format", a kind of format output method.

So, basically, the function of printf printf is to display some information on screen. And advanced functions are format output string.

2.1.2.5.1.11.1.1 Output

Most basic usage of printf printf is written as following:

printf("output string")
printf("output string")

Anything inside quotations, the string delimiter, except '%', will be displayed as is.

For example, the printf printf here will print "output string" to terminal. The black-backgrounded window on your computer.

For "terminal", the name came from the hardware long long ago.

One thing you must noticed is that, example shown here is just a expression, but a statement. So, in order to make it work, you may have to add a semi-colon, ';', after whole expression.

In most case, the system will refresh output with carriage return, line feed, or both. But printf printf will never add any of which after all content have been printed. So, to let output looks normal, you need to add a new line mark at the end of string:

printf("string with new line mark at end\n")
printf("string with new line mark at end\n")

Outside end of line, new line mark can also added inside a sentence.

printf("string\nwith new line mark inside\n")
printf("string\nwith new line mark inside\n")

This may do the same as following:

printf("string\n");
printf("with new line mark inside\n");
printf("string\n");
printf("with new line mark inside\n");

(why we add semi-colon at the end of sentence? Because you will never able to written two different expression within one statement in such form)

2.1.2.5.1.11.1.2 Placeholder & format

And how about advanced functions?

The format feature is provided by placeholders. Have you ever remember I have mentioned '%' before? Percentage mark works like placeholder here, and that's why it cannot be printed directly using printf printf . The method to print out '%' into screen is done by writing '%' as "%%" in format string, the first argument provided for printf printf .

Since printf printf has the name "print with format", the placeholder must have not only the function to prevent percentage mark to be evaluated and printed. So, let us investigate more about placeholders.

As we all know, C programming language has classified data into different types. So that placeholders must have different form so that printf printf function can then distinct them. Those decorator for placeholders are called "type specifier". And a full placeholder are written according to such syntax:

<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>
<placeholder> ::= '%' [flags] [width] [.precision] [length] <type specifier>
flags         ::= '-' | '+' | space | '#' | '0'
width         ::= <number>
precision     ::= <number>
length        ::= <number>

Looks complex? Just quick glance and move forward, examples says more than standard:

type specifier Description Form Expected Data
a a , A A Output floats in hexadecimal %a %a Reals: float, double, double
d d Output integer in decimal %d %d Integers: char, short, int
o o Output integer in octal %o %o Integers: char, short, int
x x , X X Output integer in hexadecimal %x %x Integers: char, short, int
u u Output unsigned in octal %u %u Unsigned Integers: unsigned char, short, int
f f Output reals in decimal %f %f Reals: float
e e , E E Output reals in exponent %e %e Reals: float
g g , G G Output reals in shorter form %g %g Reals: float
c c Output Character %g %g Character: char
s s Output Character String %s %s String: char[] char[]
p p Output Address %p %p Pointer: * *

And their long version variants:

type specifier Description Form Expected Data
ld ld Output integer in decimal %ld %ld Integers: long
lo lo Output integer in octal %lo %lo Integers: long
lx lx , lX lX Output integer in hexadecimal %lx %lx Integers: long
lu lu Output unsigned in octal %lu %lu Unsigned Integers: unsigned long
lld lld Output integer in decimal %lld %lld Integers: long long
llo llo Output integer in octal %llo %llo Integers: long long
llx llx , llX llX Output integer in hexadecimal %llx %llx Integers: long long
llu llu Output unsigned long long in octal %llu %llu Unsigned Integers: unsigned long long
lf lf Output reals in decimal %lf %lf Reals: double
le le , lE lE Output reals in exponent %le %le Reals: double
lg lg , lG lG Output reals in shorter form %lg %lg Reals: double
% % Output % % %% %% None

Here are flags part:

flags Description Form Expected Data
- - Align left, default right %-d %-d None
+ + Force output '+', default not show for positive %+d %+d None
Insert a space before output % d % d None
# # Show '0', '0x' or '0X' with 'o', 'x', 'X' descriptor
force show decimal point with 'e', 'E', 'f'
or, not remove tailed zero with 'g', 'G'
%#d %#d None
0 0 Padding 0 instead of space %0d %0d None

Width, .precision and length:

flags Description Form Expected Data
(number) (number) minimal number of character to print, padding with space, if output longer than this value, output will not be truncated %8d %8d None
* * width not specified in format string, but obtained as parameter before argument to be formatted %*d %*d Integer: char, short, int
.number .number for integers (d, i, o, u, x, X): minimal digits to be written, less than this value will padding by 0. Longer than this value will affect nothing. 0 means nothing to print
for e, E, f: digits after decimal point
for g, G: maximal digits to be printed
s: maximal length of a sting, default, all character will be printed, until '0'
c: nothing affected
nothing placed will introduce a 1
%.10d %.f %.10d %.f None
.* .* precision not specified, but obtained as parameter before argument to be formatted %.10d %.f %.10d %.f Integer: char, short, int
h h parameter as short, for i, d, o, u, x, X %hd %hd None
l l parameter as long, for i, d, o, u, x, X
double, for f
wide char, for c
wchar string, for s
%ld %ld None
ll ll parameter as long long, for i, d, o, u, x, X
long double, for e, E, f, g, G
%lld %lld None
L L parameter as long long, for e, E, f, g, G
parameter as long long, for i, d, o, u, x, X
%Lf %Lf None

And prinf prinf will return total character it printed.

You may able to print ASCII code using printf printf now:

#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}
#include <stdio.h>

int main(void) {
  for (int i = 0; i < 128; i ++) {
    printf("ASCII: %5d, Char: %c;\n", i, i);
  }
}

Definition of printf printf function is written as:

int printf(const char * fmt, ...);
int printf(const char * fmt, ...);

So, you can call it using the form:

printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
printf("format string")
printf("format string", arguments)
printf("format string", arguments, arg2)
printf("format string", arguments, arg2, arg3)
...
2.1.2.5.1.11.2 scanf scanf

Once we learnt output part, it is also necessary to have a glance to input part.

The usage of scanf scanf is roughly like to printf printf , except function calling methods. Scanf Scanf stands for "Scan from format", so, it necessarily needs placeholder as printf printf .

Placeholders are written in this form:

<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>
<placeholder> ::= '%' ['*'] [width][modifiers] <type specifier>

Some kind of like to printf printf , right?

part Description Form Expected Data
* * * stand for discard input, or, simply skip data match the type %*d %*d None
width maximum character to be read %8d %8d None
modifiers decorator for type specifier like printf printf %ld %ld None
type data to be scan as %d %d None
part Description Form Expected Data
a a , A A floats scanf("%a", &f) scanf("%a", &f) floats
c c characters, if width is not 0, read width character and set to parameter scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) scanf("%c", &c), scanf("%3c", &c1, &c2, &c3) char
d d integer written in decimal, '+' or '-' are optional scanf("%d", &i) scanf("%d", &i) int
ld ld integer written in decimal, '+' or '-' are optional scanf("%ld", &l) scanf("%ld", &l) long
lld lld integer written in decimal, '+' or '-' are optional scanf("%lld", &ll) scanf("%lld", &ll) long long
e e , E E , f f , F F , g g , G G real numbers, '+' or '-' are optional, 'e' for exponent are optional scanf("%f", &f) scanf("%f", &f) float
i i integer scanf("%i", &i) scanf("%i", &i) int
o o integer written octal scanf("%o", &i) scanf("%o", &i) int
s s string, separated by blanks scanf("%s", s) scanf("%s", s) char[] char[]
u u unsigned int scanf("%u", &u) scanf("%u", &u) unsigned int
x x , X X int written in hexadecimal scanf("%x", &i) scanf("%x", &i) int
p p pointer scanf("%p", &p) scanf("%p", &p) * *
[] [] ranges, simplified regular expression scanf("%[1-9]", &c) scanf("%[1-9]", &c) char
% % % % scanf("%%") scanf("%%") None

Sample question: A+B Problem:

#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a, b;
  scanf("%d%d",&a, &b);
  printf("%d + %d = %d", a, b, a + b);
  return 0;
}
2.1.2.5.1.12 Conditional Statement

Since the program is not only tool to calculating, it also helps people to solve problems require decision.

So, scientists introduces conditional statement. They can decide what to do according to conditions.

2.1.2.5.1.12.1 If

If statement has form of:

if (condition) statement
if (condition) statement

When condition expression part evaluated with true, then statement part will be executed.

if (x < y)
  printf("x less than y");
if (x < y)
  printf("x less than y");

You can see, x < y x < y is condition expression, and if x indeed less than y, the program will output the information.

But this is only the simplest case, what if we want to execute multiple statement within if statement?

Remember code block? Code block can compose different statements together. So:

if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}
if (max < x) {
  swap(x, max);
  printf("x larger than current max, swap them");
}

Here, we execute two statements when x larger than current max value.

2.1.2.5.1.12.2 If-Else

Instead of just "if" statement, sometimes we may need "else" part.

if (condition)
  then-statement
else
  else-statement
if (condition)
  then-statement
else
  else-statement

Just similar to if statements, when condition is not 0, or, acceptable, execute then-statement, else, execute else-statement.

Also, you may find some case, you may classify different case, so you can written then like this:

if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement
if (cond1)
  then1-statement
else if (cond2)
  then2-statement
else if (cond3)
...
else
  else-statement

This is simply nested if-else statements for each "else if" are new if statement place in else part of further one. This is for beauty, but you can also write like this:

if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}
if (cond1) {
  then1
} else {
  if (cond2) {
    then2
  }
  ...
}

Very clear.

2.1.2.5.1.12.3 Ternary if-else operator

三元运算符

Though in most case, if-else statements is enough, it is still the statement but a expression. Thus in some corner condition, written using if-else may result in more lines of code and complexity.

Thus we introduces ternary if-else operator. With this operator, you got a expression, so you can than combine them together with other expressions.

Ternary if-else looks like this

condition ? then : else
condition ? then : else

when condition is true, then part will be executed, and if condition is false, else part will be evaluated. And finally, the value of expression will be return.

So, you may write:

int i = 10;
i = i - 100 < 0 ? 0 : i - 100;
int i = 10;
i = i - 100 < 0 ? 0 : i - 100;

or, in c++, you may found you can write like this: (we must mention c++ here for clear because this style of ternary is indeed not allowed to be written in pure c, but most of programmers may not distinct c/c++)

int i = 0;
int j = 10;
(i < j ? i : j) = 1;
int i = 0;
int j = 10;
(i < j ? i : j) = 1;

(the second case is correct because every operation in c++ are special methods(functions), so = is actually a function call, equivalent style is int::operator=(i< j ? i : j, 1); int::operator=(i< j ? i : j, 1); )

They all correct, but second one is not encouraged to use.

2.1.2.5.1.12.4 Switch-Case

Addition to if-else statement, we also have switch-case statements.

switch (object) {
  case label:
    statements
  case label:
  ...
}
switch (object) {
  case label:
    statements
  case label:
  ...
}

Label can be one of "case literal-value" or "default", and it is not necessary to add brackets if you have multiple statements in one case. Each label means an entry, when object matches label, it will execute start from the position of label, until meets break statements break statements

Then, a legal switch-case statements may look like:

int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
int i; // for random value
switch (i) {
  case 1:
  case 2:
    printf("less than 3\n");
    break;
  case 4:
    printf("larger than 3\n");
  case 5:
    printf("larger than 4\n");
  default:
    printf("do nothing\n");
    break;
}
2.1.2.5.1.12.4.1 Break statement

But what does break statement do?

Break statements has two variants. One is here, break statements used to jump out of the switch case statements' execution sequence.

When c finds object matches the label, and it will execute each statements after the label until meets end bracket, but in some case, actually, most case, you may not want it to do so. So, break can break whole process, when it executed break statements, it will simply jump out of switch-case statements, and rest statements inside will not be executed.

Though break statements in switch-case is not mandatory, but it is a good habit to add break for each label.

2.1.2.5.1.13 Loop

What if you want to execute multiple, same, or equivalent same statements? Here we needs loop.

Loop are some statements can execute other statements repeatedly according to some condition.

2.1.2.5.1.13.1 While

While loop looks similar to if statement,

while (condition)
  loop-body
while (condition)
  loop-body

and works similar to if statement as well. When condition is true, then loop-body will be executed.

Furthermore, most similar part between while loop and if statement is that body of loop has still single statement. If you want multiple statements to be evaluated, you must add brackets.

while (1) {
  printf("infinity loop\n");
}
while (1) {
  printf("infinity loop\n");
}
2.1.2.5.1.13.2 For

For loop is another type of loop, it may not that clear to have the name "for",

for (initial; condition; update)
  loop-body
for (initial; condition; update)
  loop-body

for loop always have four part.

Initial part give the ability to define loop variable and initialize them inside the loop. Condition part is same as while loop, if it is true, then body executed, else, just break the process. Loop-body, still, same as if and while loop, execute if everything OK. And finally, update, when loop-body finished, the for loop will do update, to update loop variable.

for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}
for (int i = 0; i < 10; i ++) {
  printf("%d", i);
}

Another important part is that, for totally four part of for loop, initial initial , condition condition , and update update parts can be empty. Thus, you may find in some special case,

for (;;)
  body
for (;;)
  body

can be seen as infinity loop.

2.1.2.5.1.13.3 Do-While

But what if we need to execute body at least once?

Then we need do-while loop.

do {
  body
} while (condition);
do {
  body
} while (condition);

Apart form other statements, do-while loop requires brackets compulsory.

2.1.2.5.1.13.4 Break

Still break, the other form of break is here, when break statement used within the body of loops, it will jump out of whole loop. Discard anything after break. Even update part of for loop.

Similar to switch-case.

2.1.2.5.1.13.5 Continue

Sometimes, you may need to just skip rest of part in body, but not jump out of loop, then you needs continue statement.

When continue executed, it will just go to another round of loop, do update, test condition, and new execution process of body.

2.1.2.5.1.14 Array

When we are dealing with small scale of data, define multiple variables is enough, but how about sequence of data?

For example, read scores of over 500 students and sort them.

In contrast, average and maximum can be done with only one or two variables, but this requires store all information.

Arrays are linear and continuous data structure for storing same type values.

Definition for one-dimension array written as following:

type name[length];
type name[length];

And further, array can be multiple-dimension.

type name[length][length];
type name[length][length][length];
...
type name[length][length];
type name[length][length][length];
...

Once we define an array, then it has length elements stored, you may visit them using index:

name[idx];
name[idx];

each element can be seen as a regular variable whose type is same as type used to define whole array.

And we can then traversal array using loop:

int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}
int arr[10];
for (int i = 0; i < 10; i ++) {
  arr[i] = i;
}

Then, how can we initialize an array?

There are two main ways:

type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...
type name[] = {value1, value2, ...};
type name[length] = {value1, value2, ...};

type name[][length] = {value1, value2, ..., value6, ...};
type name[length][length] = {value1 ...};
type name[length][length] = {{value1, ...}, {value_length, ...}};
...

One is not write length, but just wrap initial values using brackets, the final array will have the length of total count of initial values. The other way is to specify length, and also provide initial value wrapped using brackets.

For multiple-dimension arrays, you must specify other dimension length except first one, and you can write initial values directly in one pair of brackets, but also, spare each dimension array elements using different brackets pair.

2.1.2.5.1.14.1 C Style String

Finally, we come to string part.

As we mentioned before, string and character has some special relationship. Actually, strings in c programming language are array of char.

In C programming language, it will treat char array end with '0' as a string.

2.1.2.5.1.15 sizeof sizeof

Though it is possible to traversal arrays using literals. It is not that convenient.

To simplify operation, we can use sizeof sizeof operator:

sizeof(type)
sizeof(variable)
sizeof(array)
sizeof(type)
sizeof(variable)
sizeof(array)

sizeof sizeof operator will return the total length of target type/variable/array in bytes. So, to have the length of array, we can say that:

int len = sizeof(array) / sizeof(type);
int len = sizeof(array) / sizeof(type);
2.1.2.5.1.16 Iterator

To traversal arrays, using idx idx traversal variable is one possible method. The other way to archive the goal is using iterator.

int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}
int a[10];
for (int*p = a; p < a + 10; p ++) {
  *p = 1;
}

here, we defined p as iterator for array a. And then, it is able to iterate whole array.

The p here is called, pointer points to int.

More detail will be covered in Pointers section.

2.1.2.5.1.17 Function

Function, a kind of contract, accepts some input and generate outputs. Most similar to their mathematical form, any same input provide for a function will result in same output. Furthermore, the format of function is almost same as that in math:

int func(int R);
int func(int R);

You may assume it as: function 𝑓:𝑁𝑁 or 𝑓(𝑥)𝑁,𝑥𝑁 And

float func(float a, float b);
float func(float a, float b);

may represents function 𝑓:𝑅,𝑅𝑅 for 𝑓(𝑣)𝑅,𝑣=𝑎,𝑏,𝑎,𝑏𝑅.

Formally, input in C programming language can be zero or more parameters. And output are something so called "return value". There may exists more way to pass output value other than regular returning method.

Ideally, a function may not affect anything outside itself, this kind of function are seen as pure functional function. But, in normal program, they may need to perform operations other than calculation. For example, I/O. Any operation modify memory, variables outside its own scope, or perform I/O, are defined as side effects of a function.

More particularly, some function in C programming language may have even no returning but side-effects.

2.1.2.5.1.17.1 Definition

To brief understand function in c, first look at the function definition.

Function definition does almost same as variable declaration, but the main purpose it to tell the compiler about a function's name, return type and its parameters, rather than allocate a new space indeed.

We call it prototype.

<return-type> <function-name>(<parameters> ...);
<return-type> <function-name>(<parameters> ...);

Usually, prototype are placed within headers.

For example, you may have prototype for function add add that generate sum of two integer like:

int add (int a, int b);
int add (int a, int b);

Here we declare the function add, which accepts two arguments, corresponding to parameters a, and b respectively.

And then, as variables must initialized before referenced. Functions must have finish implementation before being called.

Function implementation roughly like declaration, but with extra function body part:

<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}
<return-type> <function-name> (<parameters> ...) {
  <function-body>...
}

Body part may be regular statements, but also possible for return return statement.

Purpose of return return statement is tell the program, which value are seen as return value of the function.

Like equation mark in 𝑓(𝑥,𝑦)=𝑥+𝑦.

Here we implement function add add :

int add (int a, int b) {
  return a + b;
}
int add (int a, int b) {
  return a + b;
}
2.1.2.5.1.17.2 Function Calling

Once a function has been defined, it can be used in our program with function call syntax.

As we mentioned very early at the beginning of our tutorial, a function call is written in such form:

<function-name> (<arguments> ...)
<function-name> (<arguments> ...)

And arguments must match parameter in order and type.

For example, if we have a function add defined before,

int add(int a, int b){
  return a + b;
}
int add(int a, int b){
  return a + b;
}

Then we can use it like:

#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}
#include <stdio.h>

int main(void) {
  int a = 10;
  a = add(a, 20);
  printf("%d", a);
  return 0;
}

first argument we provide for add add is integer variable a, which has the same type as parameter a a , and second argument is literal value 20 20 , since any integer literal without suffix will be seen as integer in c, it has also same type with parameter b b . Thus, the function call is acceptable.

But what if we provide arguments less, more, or even has type mismatch? The C programming language will complain about syntax error.

2.1.2.5.1.17.3 Recursion

Since a function can be called within body of other functions, it make nonsense to prevent a function calling it self.

A function that calling it self are called recursion function.

For example, factorial function can be defined using recursion:

int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}
int factorial(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

The basic structure of recursion function is similar to normal function, the only difference is that it calls itself within its body.

But since recursion function may call itself infinite times, it must have a terminal condition to stop further calls.

Here the if statement works as terminal condition. When n equals to 0, the function will return 1 directly, without further calling itself.

2.1.2.5.1.17.4 Function Tail Call Optimization

In some case, a function's last operation is calling another function, which is called tail call.

And if a function's last operation is calling itself, it is called tail recursion.

In most case, a infinite tail recursion will result in stack overflow, but with tail call optimization, the compiler can optimize tail calls to avoid the case.

The common way to implement tail call optimization is Continuous Passing Style.

2.1.2.5.1.17.4.1 Continuous Passing Style

Continuous Passing Style (CPS) is a style of programming where control is passed explicitly in the form of a continuation.

2.1.2.5.1.18 Assembly
2.1.2.5.1.18.1 Architecture
2.1.2.5.1.18.1.1 AMD64 (x86_64)
2.1.2.5.1.18.1.2 Aarch64 / arm64
2.1.2.5.1.18.1.3 MIPS / Loong
2.1.2.5.1.18.2 BUS
2.1.2.5.1.18.2.1 Bridges
2.1.2.5.1.18.3 CPU
2.1.2.5.1.18.4 Intel Syntax, AT&T Syntax
2.1.2.5.1.18.5 Memory Access
2.1.2.5.1.18.6 Commands
2.1.2.5.1.18.7 Direct Memory Access
2.1.2.5.1.19 Stack
2.1.2.5.1.19.1 Frames
2.1.2.5.1.19.2 Stack Variables, Local Variables
2.1.2.5.1.19.3 Recursion Function Expansion
2.1.2.5.1.20 Global Variables
2.1.2.5.1.21 Variable Scope
2.1.2.5.1.21.1 Dynamic Scope
2.1.2.5.1.21.2 Lexical Scope
2.1.2.5.1.21.2.1 Function Scope
2.1.2.5.1.21.2.2 Block Scope
2.1.2.5.1.22 Closure
2.1.2.5.1.23 Heap Space
2.1.2.5.1.23.1 Variable Allocation
2.1.2.5.1.24 Memory Management
2.1.2.5.1.24.1 Virtual Memory (OS)
2.1.2.5.1.25 Function Call
2.1.2.5.1.25.1 Function Stack
2.1.2.5.1.25.2 Function In Assembly
2.1.2.5.1.26 goto goto
2.1.2.5.1.27 User Defined Types
2.1.2.5.1.27.1 Struct Struct
2.1.2.5.1.27.1.1 Bit Field
2.1.2.5.1.27.1.2 Simulate class class Using Structure
2.1.2.5.1.27.1.3 Virtual Function Table
2.1.2.5.1.27.2 Enum Enum
2.1.2.5.1.27.3 Union Union
2.1.2.5.1.28 Structure space, Memory Alignment & Offset
2.1.2.5.1.29 Pointers
2.1.2.5.1.29.1 Pointer offset, index & linked list
2.1.2.5.1.29.2 Array, Pointers Points To Continuous Memory
2.1.2.5.1.29.3 Function pointers
2.1.2.5.1.29.3.1 Form
2.1.2.5.1.29.3.2 Function As Function Pointer
2.1.2.5.1.29.3.3 Calling With Function Pointer
2.1.2.5.1.29.3.4 Simplified Function Call
2.1.2.5.1.29.4 Void Pointers
2.1.2.5.1.29.5 Pointer Convert
2.1.2.5.1.30 Pointer in Assembly
2.1.2.5.1.31 Exception
2.1.2.5.1.31.1 setjump setjump , longjump longjump
2.1.2.5.1.31.2 Try-Catch, Throw
2.1.2.5.1.31.3 Seh, Structure exception handler
2.1.2.5.1.31.4 Herbexception
2.1.2.5.1.31.5 Exception spread
2.1.2.5.1.31.6 Condition System
2.1.2.5.1.31.7 Continuous
2.1.2.5.1.32 Preprocessor
2.1.2.5.1.32.1 Header files, #include #include
2.1.2.5.1.32.2 Macro
2.1.2.5.1.32.2.1 C Style Macro
2.1.2.5.1.32.2.2 M4 Macro Language
2.1.2.5.1.32.2.3 C++ Template
2.1.2.5.1.32.2.4 Rust Procedure Macro
2.1.2.5.1.32.2.5 Rust Macro Rules
2.1.2.5.1.32.2.6 Macro Assembly, Pseudocode
2.1.2.5.1.32.2.7 Common Lisp Expansion Macro
2.1.2.5.1.32.2.8 Common Lisp Reader Macro
2.1.2.5.1.32.2.9 Scheme Hygiene Macro System
2.1.2.5.1.32.2.10 Scheme Syntax Rules
2.1.2.5.1.32.2.11 Scheme Syntax Case
2.1.2.5.1.32.2.12 Hygiene for the Unhygienic
2.1.2.5.1.32.3 Compiler Comments
2.1.2.5.1.32.4 #progma #progma
2.1.2.5.1.33 Meta-programming
2.1.2.5.1.34 Compiler
2.1.2.5.1.34.1 Compile Process
2.1.2.5.1.34.2 Compiler Driver
2.1.2.5.1.34.3 Assembler
2.1.2.5.1.34.4 Assemble
2.1.2.5.1.34.5 Assembly Code
2.1.2.5.1.34.6 Linker
2.1.2.5.1.34.7 Link
2.1.2.5.1.35 Executable File
2.1.2.5.1.35.1 Object
2.1.2.5.1.35.2 Executable
2.1.2.5.1.35.3 Executable File Format
2.1.2.5.1.35.3.1 Portable Executable (PE)
2.1.2.5.1.35.3.2 Executable Linkable Format (ELF)
2.1.2.5.1.35.3.3 Mach-5 (Fat-5)
2.1.2.5.1.35.3.4 Common Object File Format (COFF)
2.1.2.5.1.35.3.5 Binary (Bin)
2.1.2.5.1.36 ABI
2.1.2.5.1.36.1 Function Call Conventions
2.1.2.5.1.36.1.1 __cdecl __cdecl
2.1.2.5.1.36.1.2 __stdcall __stdcall
2.1.2.5.1.36.1.3 __fastcall __fastcall
2.1.2.5.1.36.1.4 thiscall thiscall
2.1.2.5.1.36.1.5 Microsoft 4-register fastcall __vectorcall __vectorcall
2.1.2.5.1.36.1.6 System V ABI syscall
2.1.2.5.1.36.2 Function Naming Convention
2.1.2.5.1.36.2.1 C Function Naming Convention
2.1.2.5.1.36.2.2 MSVC C++ Function Naming Convention
2.1.2.5.1.36.2.3 Rust Function Naming Convention
2.1.2.5.1.36.2.4 Common Lisp Naming Convention
2.1.2.5.1.36.3 Endian
2.1.2.5.1.36.4 Dynamic Linked Library
2.1.2.5.1.36.5 Static Linked Library
2.1.2.5.1.36.6 fPIE, fPIC
2.1.2.5.1.37 Multiple File Compile
2.1.2.5.1.37.1 Compile Unit
2.1.2.5.1.37.2 Object
2.1.2.5.1.38 Build Systems
2.1.2.5.1.38.1 C Project Management
2.1.2.5.1.38.2 Makefiles
2.1.2.5.1.38.3 AutoTools
2.1.2.5.1.38.4 CMake
2.1.2.5.1.38.5 VSXMake (VSProj)
2.1.2.5.1.38.6 XMake
2.1.2.5.1.39 Variable Decorator
2.1.2.5.1.40 asm volatile (assembly code : output operands : input operands : clobbers) asm volatile (assembly code : output operands : input operands : clobbers)
2.1.2.5.1.41 __attribute__((attribute)) __attribute__((attribute))
2.1.2.5.1.42 _Generic _Generic
2.1.2.5.1.43 ..., va_start, va_arg, va_end ..., va_start, va_arg, va_end Macro, stdarg.h
2.1.2.5.1.44 __VA_ARGS__ __VA_ARGS__
2.1.2.5.1.45 Variable Length Array
2.1.2.5.1.46 ASCII, EBCDIC, Unicode/UCS-II
2.1.2.5.2  From The C Programming Language To Theoretical Computer Science (Section II) [S2]
2.1.2.5.2.1 From the C programming language to Theoretical Computer Science
2.1.2.5.2.1.1 Object-Oriented Programming
2.1.2.5.2.1.2 Generic Types
2.1.2.5.2.1.2.1 Template
2.1.2.5.2.1.2.2 Types Erase
2.1.2.5.2.1.3 Inheritance
2.1.2.5.2.1.3.1 Class Object
2.1.2.5.2.1.3.2 Prototype Chain
2.1.2.5.2.1.4 Polymorphism
2.1.2.5.2.1.4.1 Interface
2.1.2.5.2.1.4.2 Trait
2.1.2.5.2.1.4.3 Duck Type
2.1.2.5.2.1.5 Encapsulation
2.1.2.5.2.1.5.1 Accessibility
2.1.2.5.2.1.6 Object System
2.1.2.5.2.1.6.1
2.1.2.5.2.1.7 Turning Machine
2.1.2.5.2.1.8 Lambda Calculus
2.1.2.5.2.1.9 First Order Function
2.1.2.5.2.1.9.1 Church numeral
2.1.2.5.2.1.10 Formal Verification
2.1.2.5.2.1.11
2.1.2.6 scheme
2.1.2.6.1  MIT 6.001: Structure and Interpretation of Computer Programs (SICP) [S1]

“ Computer science is not about computers, any more than astronomy is about telescopes, or biology about microscopes. ”

Computer is neither about science nor about computers, instead of a subject that helps explore the nature of computation itself, it is a engineering discipline that focuses on building systems that perform computations, aka., how to use computers to solve problems.

Likely geometry, which originally focused on measuring land, later evolved into a abstract mathematical discipline that studies the properties of space and shapes.

The main problem the computer science tries to solve is to describe the process of computation.

In mathematics, functions are used to describe relationships between quantities. In this aspect, a equation cannot tell us how to compute the value of a function. And computer science can provide us a way to describe such process, to compute and solve the functions.

The main purpose is to find the way to formalize such process, to describe the process of computation itself. In some case, the systems can be such large and complex that nobody can fully understand the whole system. And that's why we need to build abstractions to help us manage the complexity of such systems. What make this possible is the idea of procedures, which can be used to build abstractions. A technique to manage complexity.

Computer is a virtual environment that will not affect by real world constraints, such that the system can be built in any way we want. The only limitation is our imagination and creativity. A ideal system.

2.1.2.6.1.1 Preface
2.1.2.6.1.2 Section 1: Building Abstractions with Procedures

The first way to build abstraction is black boxes, aka., procedures. Which accepts some inputs, and produce some outputs, without revealing the internal details of how the procedure works. This way is called encapsulation nowadays.

Fix points: A fix point of a function is a value that does not change under the application of the function. And in this case, what we want to do is to find a way that can compute such fix points. Package the process into procedures. And how can we archive this is a instructive knowledge. How about to apply such procedure? How about to use such procedure to find the fix points of other functions? And how about to build new procedures that build upon such procedure?

In this chapter, we'd talk about several topics:

  • Primitive Elements
  • Combinations
  • Abstract and how to build new abstractions
  • Extract common patterns
2.1.2.6.1.2.1 Lisp

The main purpose to have such section is not to programming in Lisp, rather than to learn how to think about programming. What is about to learn is a general framework, which compose of primitives, means of combination, and the means of abstraction.

The combination of Lisp expressions are organized in a tree structure, aka., S-expressions. P.S., in compiler, such tree structure is called Abstract Syntax Tree (AST).

2.1.2.6.1.2.2 define define

The way to build new abstractions is using define define . By extract general ideas from specific examples, it is possible to create new procedures.

2.1.3 Uncategorized
2.1.3.1  Not actually even a diary [diary]
2.1.3.1.1 2025-12-21 01:35: Nothing Special

Of course, this should be written in the form of mail. Or, in my original plan, as chat history.

To difficult to design a function.

2025-12-21 01:42
Self
With nickname, made up, date time, when it was sent, and content. But after all, just keep it simple. Not bad to calling library.
2025-12-21 02:11
It works now.
2025-12-21 02:12
2.1.3.1.2 2025-12-21 02:13: Silent Message
2025-12-21 02:13
Self

You know, I always alone, There is no one to talk to.

They never concerning what I said, the things I'm caring about.

Assembly code, Computation theory, Mathematics, the thing they never understand, the things they never ever care about.

I cannot even cry, Nobody will give even a glance.

Right, you know.

On my poor stupid.

2025-12-21 02:23
Self

The reason why you feel suffering, is only that you are trying every effort to spare others.

So stupid.

What the fuck you even written?

In fucking English.

You desired caring.

Long for love.

2025-12-21 02:30
Self

No, never, ever, thinking about girls. You mother fucker stupid.

The only thing you can archive is messing up everything.

Weeb, ah,

They, won't ever, even, want you to be their friend.

Understand?

2025-12-21 02:37

甚至唯一会主动给你发消息的还是steam推广和github education通知.

笑嘻了.

2025-12-21 02:39

大学三年连个屁都放不出来,

还lilies, 白日做梦.

天天就光玩你那破汇编去吧. 饭都吃不起的家伙.

谁理你啊

2025-12-21 02:40

人缘还差, 性格跟粑粑似的.

照镜子都不犯恶心吗

还计算机

2025-12-21 02:41

天天摆烂, 屁事情不干光玩有的没的去吧

马上期末了, 等死

2025-12-21 02:42
懒到生蛆, 吃屎都赶不上热乎的
2025-12-21 02:43

每天除了意淫有人喜欢你以外还能干什么.

2.1.3.1.2.1 2025-12-21 18:38: 摆烂日寄

讲真话现在天天看晚上发电的都尴尬, 中二一笔.

Self

有些人他们是如此的幸福, 如此的,,, 快乐, 以至于完全不知道, 没有体会, 什么是痛苦.

多么让人羡慕啊, 多么让人… 甚至, 都称不上嫉妒 只能说祝福.

因为, 这样的事, 这样的幸福, 永远不会属于我们 永远不会

直到我们死去, 留下终生苦痛

我们的命运, 我们的义务, 我们的责任, 不容改变

不要怀揣不应有的情感 它即不属于你, 就永远不要期望

你死那天 鲜花会落下, 但不是为你

任时代滚滚, 碾碎红尘, 而独独不会记住你

这就是我们, 可悲至极

"渣滓, 废物"

因为自私, 因为怯懦, 心无大志, 四体不清

我只想这样平凡的死去

不能再承受, 我们的过去, 我的未来

无病呻吟

唯独安宁的死亡可以给予宽慰

将我们带走, 直到黎明

多么想要, 知道如何活下去啊

2.1.3.1.3 2025-12-22 17:52: We choose to give up

You shall understand, the emotions called love is coming from the chemical reaction in brain. Something not logical, not rational.

The purpose of life, born with, assigned by the creator, to survive, to reproduce. It's not our choice to make.

We choose to give up, we choose to fight for myself, we choose to discard the natural selection.

The idea is, the emotions, derived from the chemical reaction in brain, affected by the hormones, distorted by the environment, drive us crazy, make us nonsensical.

The history won't remember us, but ourselves.

2.1.3.1.4 2026-01-02 23:35:

我们曾向往星辰, 但时间将激情消磨

但是, 我们仍将疯狂, 为了未来, 为了愚蠢

只需追求, 可见, 不可见的, 未来

我们终将到达的, 无法改变的, 未来

悲哀的, 痛苦的, 永远哀恸的未来

我们已知未来不可为, 我们已知未来黯淡

我们追寻, 我们探索, 我们不择手段

只为前进, 只为苦痛

鞭笞, 失望, 耻笑, 永远伴随

孤单, 愤怒, 害怕, 永远显露

未来是锚点, 是知识, 是出口

这是可知不可知的知识, 不可传述, 只有一瞬间的认知

2.1.3.1.5 2026-01-05 01:28: 愿神明赐与我永恒的安眠
2.1.3.1.6 2026-01-24 19:39: 找到男盆友啦!
2.1.3.2  Assembly, Constitution Principle of Computer, Computer Organization and Architecture and Operating System (Section I) [S1]
2.1.3.2.1 Assembly, Constitution Principle of Computer, Computer Organization and Architecture and Operating System
2.1.3.2.1.1 Section I: Basis Assembly
2.1.3.2.1.2 Coding, Numeration, Radix

Values, plain bits, expressed in high or low electronic levels, may represent some information. With corresponding context or encoding, together with its own properties, like name, can be then interpreted as real information, the data. Raw information, data, must have some way to be stored. And the way to translate original data into values can be stored in computer, it then called, "coding". Encoding converting data into a specific format or representation.

Coding help people understand data.

2.1.3.2.1.2.1 Symbol, Calculation & Presentation

Calculation are some relation between different data. Directly, manipulate different value in different coding.

2.1.3.2.1.2.2 Decimal

Decimal integers are numbers based on ten, which means every number represented in decimal form may contains only 0-9. Every digit's value based on position dependent power of 10.

2.1.3.2.1.2.3 Binary

Binary integers are numbers based on two, every time a digit has value of 2, will result in carry. Digits in binary representation called "bits". Thus only 0, 1 will appear in binary representation.

Every bit's value based on position dependent power of 2.

2.1.3.2.1.2.4 Hexadecimal, Octal

Hexadecimal numbers based on 16 while Octal numbers based on 8.

2.1.3.2.1.2.5 Radix conversion

Referencing redix.

2.1.3.2.1.2.6 Data, Numbers, Computer

Data is presented in binary number in computer.

For each cell of calculation unit can only have two state, open and close. Which has natural one-to-one correspondence with binary bits.

2.1.3.2.1.3 CPU, BUS, Memory

Most important part of a computer is CPU. CPU, central processing unit, controls almost all calculation process of computer.

And, further more, ALU, arithmetic logic unit, is kernel of CPU. The ALU is responsible for arithmetic and logical computations. Without an ALU, the CPU would be unable to perform its core operations.

Registers are another kernel of CPU, which provides ability for CPU to store data.

CU, front-end of a CPU, controls the behaviour of whole CPU. CU may fetch commands, do preprocessing and instruct command execution order. Preprocessing for commands can be PreDecode PreDecode , Decode Decode , Micro-Fusion Micro-Fusion / Macro-Fusion Macro-Fusion , Branch Prediction Branch Prediction and Static Prediction Static Prediction .

To boost execution for float point number calculation, some CPU may also have FPU, floating point unit.

Memory access is another function a CPU must have, so, AGU, address generation unit, or ACU, address calculation unit, will help CPU calculating address offset of main memory.

MMU, memory management unit, a control unit maybe outside CPU, controls memory, maps logical memory from to physical address.

TLB, translation lookaside buffer, a critical cache for memory management, every time CPU try to map and fetch data from memory, it may visit TLB, so that memory address translation may speed up by checking existing mapping entry.

Cache, a general purpose buffer for data fetch from memory, once data caches, it can be access much faster than other data still exist only in main memory later. When data accessed, changed and used, it may also be written back to memory when every thing finished.

2.1.3.2.1.3.1 Data, Instructions

Data, the raw information, may has some specified meaning after interpreting by associating it together with context and name. Instruction, represented in same way as regular data, in binary number.

Data is what a computer processes, and instructions specify how to do so.

2.1.3.2.1.3.1.1 Dimension, Unit

To measure how much data there are, it is needed to specify units.

Unit Conversion From
bit bit / None
Byte Byte 8 bit bit
KiB KiB 1024 Byte Byte
MiB MiB 1024 KiB KiB
GiB GiB 1024 MiB MiB
TiB TiB 1024 GiB GiB
EiB EiB 1024 TiB TiB
kB kB 1000 Byte Byte
mB mB 1000 kB kB
gB gB 1000 mB mB

Most common used unit in computer is Byte, it is also the smallest data unit a computer can handle (for most computer).

As for information theory, smallest unit is bit, which is also the smallest unit to weigh memory. For most memory (SRAM, DRAM), the smallest storage unit is also bit. In most architecture, memory is visited in bytes, but there still some special processor can address using bit. Some even special ones may address by word, or double word.

Processors may treat data different as well. As for processing granularity, a byte is typically the smallest independently loadable/storable object, whereas the minimum operand width for arithmetic/logic operations depends on the ISA (commonly 8/16/32/64 bits).

2.1.3.2.1.3.2 Harvard, von Neumann Architecture

As we mentioned before, data and instructions both stored in binary form. So, CPU cannot actually tell whether some memory storing data or instructions.

Thus there are two method to store them.

One is "von Neumann Architecture", data and instructions share same memory space. In this way, it depends on context to distinct which one is data and which one is instruction.

Another way called "Harvard Architecture", for which data and instructions are stored in two different memory.

von Neumann architecture provides programmers with flexibility to treat data as instructions, so that some self-modifying code can be possible. For example, some JIT compiler are implemented in such way.

Harvard architecture, however, prevent data from being treated as instruction. Though it reduces flexibility, ambiguity are prevented.

2.1.3.2.1.3.3 Program Counter, Instruction Register

How a program executes? CPU reads instruction, and them executes them. Both Harvard and von Neumann architecture will follow this process.

But how CPU read instructions then? Let's concerning von Neumann architecture first: Data and instructions are mixed up in memory for a von Neumann processor. Thus, there must have something can record which one is instruction, so that processor may not read wrong memory. Each time processor want to execute next instruction, it will refer to the thing. And after processor executed one instruction, it may move to next instruction, so that processor can execute whole program in specified sequence, rather than just execute one instruction repeatedly. What will happened when we switch to Harvard architecture processor? Still, though where instructions are placed is fixed for computer during a program's execution. The processor must know, how many instructions it has executed and where next instruction is.

Thus, in practice, there must exist a abstract register called "Program Counter Register" tracks instruction execution.

But, where shall CPU read instructions to? To parse and knowing detailed execution information, CPU first read instructions according to PC, and then put what it reads to IR, "Instruction Register".

Those instructions then parsed and analyzed, and take effects.

PC and IR are both abstracted concept of physical registers. They may not exists in real CPU, but there must exist a, or group of, register(s) do the function they describes.

2.1.3.2.1.3.4 Memory Address Register, Memory Buffer Register, Memory Data Register & Memory

When CPU try to visit memory, it also needs something to record where it meant to read. Just like PC records which instruction should execute next. MAR, "Memory Address Register" records which memory should be read next. And just like IR records instruction read, MBR caches data read from memory.

In some case, MBR can also be called as "Memory, Data Register", MDR.

Furthermore, most important, MAR, MBR still not the real register.

2.1.3.2.1.3.5 Fetch-Execute Cycle

When CPU executing programs, it follows the fetch-execute cycle. Until it receives halt instruction, it will repeat read, decode, execute process.

Instructions are stored in memory, and CPU must read them so that it can be decode then. CU, controls the whole process of reading and decode.

CPU first determine logical memory address according to PC, and then send the memory request to MAR. MAR store the command and communicate with main memory. Main memory pass requested data, or instructions to CPU by bus, and then store those data in MBR. IR then fetch instruction from MBR, split full instruction into Operator part and address part. Calling ALU to actually execute the instruction.

This is a full fetch-execute cycle for CPU.

2.1.3.2.1.3.6 CISC & RISC

CISC, Complex Instruction Set Computer, a collection of architecture, try to improve computer performance by decrease instruction number of some specify operations. In general, CISC computer may have more special purpose instruction, so that it can perform different complex operation within one execution cycle. Instructions used by CISC, sometimes are multiple-bytes, and may vary with its purpose. Total CPU cycle consumed by a instruction may also vary. But they always provides various method for memory accessing.

While RISC, Reduced Instruction Set Computer, try to reduce type of instructions. Since most instructions in CISC may not used frequently, and some of those instruction can be seen as combination of other simpler high-frequent instructions, improve the performance of basic instructions may have higher performance overall, and this can make ISA design simpler as well. Instructions are all fixed byte and most of them consume only 1 CPU cycle strictly in RISC. CPU pipeline can even shrink some instructions' execution less than 1 CPU cycle. Memory addressing method are limited and most operations are finished in register.

Most register in CISC may have its own function but those in RISC are mostly general purposed. Furthermore, overall number of registers in RISC are more than those in CISC.

CPU control method adopted by those two type of architecture are also different, CISC often uses micro program to control whole CPU, while RISC uses logical circuit.

2.1.3.2.1.3.7 Cache

Inside CPU, it is too slow to fetch outside registers, so cache some frequent used data is a good idea. Cache may have multiple level, each get far away from core.

L1, L2 cache may spare within one core, and L3 cache may be used commonly by whole CPU.

2.1.3.2.1.3.8 Memory

Memory, most data and instructions are stored here, CPU use it to cache data, store results and communicate with other components.

Primary memory, often RAM, random access memory, have different kind of distribution. Mainly there are two different RAM,

  • Static RAM, SRAM, RAM that designed using flip-flop to store bits. "Static" means that SRAM need not extra operations to keep data. And have relative faster access speed among all kind of memory.

    • Sync SRAM
    • Async SRAM
    • Burst SRAM
  • Dynamic RAM, DRAM, RAM that designed using capacitor. "Dynamic", in contrast, needs refresh regularly, for capacitor lacks electron as time. DRAM always have smaller size, lower electronic level, but slower speed.

    • DDR
    • LPDDR

On the other part, memory can also distinct by memory Error Check and Correct ability,

  • Regular memory
  • ECC memory

Recently (but not that recently), there are a new kind of memory, Optane memory, it can even store data after power-off.

Devices other than main memory still have their own memory, for example, hard drivers, may have their own cache (a memory) to exchange information with CPU.

2.1.3.2.1.3.8.1 Address

Memory is a kind of physical device, but it is not possible to access memory through its physical information, otherwise, every program vendor must provide different program instance for every combination of memory, CPU, and other hardware. Concerning size of memory, design of memory, even id of memory.

So, mapping physical memory unit into logical memory is essential. In computer, we assume memory are continuous, no matter how many memory card you installed, and no matter what size each memory card have. And then, we split this continuous space into pieces with same logical size. Assign each logical piece with an id, for referencing. Those ID for memory space, just like id for bank safe, by accessing corresponding bank safe, we can store or withdraw things in it.

Even, you may store a id represent another bank safe inside. And we can than find another bank safe by the one you holds.

Other memories (or some special device can abstract as memory) will also be mapped and concatenated into the logical memory. And then CPU can access those devices without specify its hardware information.

This id, we call it "Address". Every address indexes a space of memory.

2.1.3.2.1.3.8.2 Bytes, Word, Double Word and Half-Word

In assembly, or CPU design, there are another measurement for data,

Name Conversion From
bit bit / None
Byte Byte 8 bit bit
Half Half 4 bit bit
Word Word 2 Byte Byte
Double Word Double Word 2 Word Word
Quad Word Quad Word 4 Word Word
Paragraph Paragraph 8 Word Word

Those units measure the data computer can manipulate DIRECTLY.

2.1.3.2.1.3.8.3 Direct Memory Access

Most time, CPU do calculating work, this takes relative small times. But when CPU have to access memory or other device, it must take multiple cycles to fetch data. Transfer data from and between memory.

Thus, it is natural to have a special designed device fetching data for CPU. When CPU have to fetch data from peripheral, DMA will take this job and copy information from those devices into memory, while CPU do its own calculating job.

2.1.3.2.1.3.9 ROM

Outside memory, there are another kind of data storage, ROM, Read-Only Memory.

This kind of flash, can store data without electronic refresh. So, even power-off may not delete required data, thus, it always used for BIOS storage.

As time goes, ROM soon developed into EPROM, EEPROM and NAND Flash. Which can be read and rewritten using special tool, can be covered using light or other method, and Write-Rewrite using only electron. NADA Flash is the basis of USB Memory Driver and SSD.

2.1.3.2.1.3.10 Storage

Hard drivers, together old school soft drivers, are storage for computer, which have larger space, more reliable storage ability than memory. Always have the responsibility for keep data.

But the speed of storage are much slower than memory.

2.1.3.2.1.3.11 BUS

How CPU access its desired data, how CPU touches its required devices indeed?

In modern computer system, CPU communicate with other devices through BUS.

Why we need BUS, rather than other communicate architecture?

  • BUS can decrease complexity: In other system, like directly communicate, if we have N devices to communicate, then there must have at least 𝐶𝑁2 circuit. But with BUS, N-N network topology can be then reduced to N-1-N topology or N-1-Adapter-1-N bus-star topology.
  • BUS also standardize interfaces for devices. Before PCIe, there are multiple different connector for devices.
2.1.3.2.1.3.11.1 Address BUS

Address Bus, as its name, used for transfer memory address. With address bus, CPU then can visit its wanted memory.

Address Bus transfer address information, and only pass from controller to terminal device. Width of address bus determine the largest memory space a computer can visit.

With a 32-bit address bus, CPU can visit maximum 4GB data.

2.1.3.2.1.3.11.2 Data BUS

Data Bus transfer actual data, as CPU specify its wanted data space address by Address Bus. The terminal device may return actual data the space stores back towards CPU using Data Bus. Also, CPU may write its result to memory by Data Bus.

Data Bus transfer data, Data Bus can transfer data towards both side. No matter data from CPU and write to terminal device, or come from terminal and fetched by CPU. Width of Data Bus limits maximum size of data a CPU can fetch or write.

With a CPU with register size 64, Data Bus width 64, whole register can be stored directly.

2.1.3.2.1.3.11.3 Control BUS

Control Bus transfer control or status signal. Both side can send or receive signal transferred by Control Bus. Width of Control Bus can affect operations of CPU.

Signals send by Control Bus controls the behaviour of devices, for example, write or read signal send to storage will instruction storage which data to read or how to store some data. Also, signals send by terminal devices may also affect CPU, for example, I/O finish interrupt signal may tell CPU some data finish reading.

2.1.3.2.1.3.11.4 Dual Independent BUS: North, South Bridge

In traditional bus system, bus connects all components of a computer. This result in long time waste when I/O transfer.

Then it is possible to spare high-speed devices and low-speed devices into two bus.

Back Side Bus, inside CPU, connect each kernel of CPU, ALU, CU and so on. Front Side Bus, outside CPU, connect CPU with North and South Bridge.

  • North Bridge, connects CPU, North Bridge and other high speed devices. Main Memory and high speed caches
  • South Bridge, connects to North Bridge and other low speed devices.

    • PCI: high speed I/O devices
    • ISA: low speed I/O devices
2.1.3.2.1.3.12 Stack

Since memory is represented in large continuous space logically. Find methods for data management is a large problem.

A simple way to manage data is stack.

Stack is a linear first-in-last-out data structure. First choose an address as base of stack, and then we can push data and pop data out of the stack. On the other way, it is possible to index element inside a stack by offset.

2.1.3.2.1.3.12.1 Stack grows downwards

In computer, continuous memory have address, and then some address with larger value can be seen as high address, and thus we can define the side of stack.

In general, we always choose higher address as the base of stack, and then stack increment will result in stack grown towards lower address.

Why stack always choose higher address: https://github.com/mujiu555/Wishful-Thinking/blob/mujiu555@feat/c/doc/root/c/typ/S1.typ.

2.1.3.2.1.3.12.2 Push

Push operations to stack eventually lead to stack growth. It first add new element onto the top of stack, and then increase stack top pointer.

2.1.3.2.1.3.12.3 Pop

Pop operation to stack eventually lead to stack shrink. It store the value store at top to somewhere, and then decrease stack top pointer.

2.1.3.2.1.3.13 Registers

Registers in CPU, is the most basic function unit. They have the function to store data, and put them into calculating.

Following are registers commonly used in 8086 8086 , i386 i386 , x86 x86 , ia32 ia32 , amd64 amd64 ( x86_64 x86_64 ).

2.1.3.2.1.3.13.1 AX(Accumulator), BX(Base Address), CX(Counter), DX(Data)

In x86_64, there are four general purpose registers. They are *AX *AX , *BX *BX , *CX *CX , *DX *DX .

Those general purpose registers can be divide, and used as smaller registers.

Name Representation x64 x86 x16 8
Accumulator Accumulator *AX RAX EAX AX AH, AL
Base Address Base Address *BX RBX EBX BX BH, BL
Counter Counter *CX RCX ECX CX CH, CL
Data Data *DX RDX EDX DX DH, DL
  • *AX register always join calculation, and can store results in mut mut , div div operation, or function call returning value.
  • *BX register always join rebase operation, used as memory access offset.
  • *CX register always treat as counter, and will automatically decrease in loop.
  • *DX register always transfer arguments, do I/O operation.
2.1.3.2.1.3.13.2 CS:IP(Code Segment: Instruction Pointer)
2.1.3.2.1.3.13.3 SS:BP, SS:SP (Stack Segment: Base Pointer, Stack Segment: Stack Pointer)
2.1.3.2.1.3.13.4 SI, DI (Source Index, Destination Index)
2.1.3.2.1.3.13.5 DS (Data Segment)
2.1.3.2.1.3.13.6 ES (Extra Segment)
2.1.3.2.1.3.13.7 FLAGs
2.1.3.2.1.3.13.8 R8, R9, R10, …, R15
2.1.3.2.1.3.14 Heap
2.1.3.2.1.4 Syntax
2.1.3.2.1.4.1 Operator, Operand
2.1.3.2.1.4.2 Comment
2.1.3.2.1.4.3 Memory Access
2.1.3.2.1.4.4 Labels
2.1.3.2.1.4.5 Macro
2.1.3.3  The Missing Semester of Computer Education [S1]
2.1.3.3.1 Section I

除了算法, 工具可以有效提升工作效率, 这是一个尝试, 教授如何掌握工具, 以及提供(可能)不清楚但是有用的工具.

这会跟进很多领域(11)

仅会介绍少量极其有用的工具

2.1.3.3.2 Shell

Shell是与计算机交互的一个重要途径

可以组合文本操作,

  1. 可以直接在shell中输入指令
  2. 可以通过参数临时修改程序执行的行为(由程序自身决定)
  3. 参数通过空格隔开

Shell可以通过PATH路径找到可以使用的指令 PATH是用来在计算机中找到可执行文件的方式

  1. 绝对路径: 一个文件的全部路径
  2. 相对路径: 文件相对于当前工作目录的路径 (pwd)
  3. . . : 当前目录
  4. .. .. : 上一级目录
  5. ~ ~ : 家目录
  6. - - : 上一次 cd cd 的目录

命令的参数:

  1. 一般通过 --help --help 查询
  2. - - flag, 一般为短开关, 可以自由组合
  3. 方括号一般表示内部是可选的

权限:

  1. 第一位表示目录/普通文件/套接字
  2. ugo: 所有者, 所有者组, 其他:
  3. 一共由9位二进制表示: 每三位表示对应对应用户(组)的权限, 读/写/执行
  4. 目录的写权限仅影响是否可以删除修改其内部的文件
2.1.3.3.2.1 Most used Commands
  1. mv mv :
  2. cp cp :
  3. mv mv :
  4. mkdir mkdir :
  5. rmdir rmdir :
  6. man man :

PS. info info :

Short cut: Ctrl-L Ctrl-L : clear

2.1.3.3.2.2 Shell Stream

iostream redirection

  1. < file < file : input redirection (from file)
  2. > file > file : output redirection (into file)
  3. PS. << label << label : input redirection (until read lable)
  4. >> file >> file : output append into file
  5. prog1 | prog2 prog1 | prog2 : pipe, output redirection to another program
2.1.3.3.2.3 Root user

Super administrator

sudo sudo : super user do (do as super user)

2.1.3.3.2.4 /sys /sys

vfs: kernel variables

  1. tee tee : echo < file | sudo tee /target echo < file | sudo tee /target
2.1.3.3.2.5 Shell Scripting

语法,

2.1.3.3.3 VIM

我用的是NeoVim (LazyVim distro)

仓库地址: [My LazyVim](https://github.com/mujiu555/my-lazyvim)

2.1.3.3.4 Section IV
  • grep

  • less

  • sed

    • regular expression
  • sort

  • head

  • tail

  • uniq

  • wc

  • paste

  • awk

  • bc

  • xargs

  • parallel

2.1.3.3.5 Command Line

Short cut for shell

  • Ctrl-C: SIGINT
  • Signal: kill
2.1.3.3.5.1 Tmux
2.1.3.3.5.2 Dot files

Configurations

  • alias
  • .bashrc
  • PS1
2.1.3.3.6 Debugging
2.1.3.3.6.1 Logger

Log, like printf, but with more information.

It is possible to print with colour.

Using ASCII escape codes to draw color.

It is possible to use third party log system. Most of which may be placed in /var/log /var/log . Journalctl Journalctl will place log in /var/log/journal /var/log/journal

2.1.3.3.6.2 Debugger

Step debuggers: GDB GDB , and so on.

It is possible to walk through the execution

2.1.3.3.6.3 Static checker

Try to detect errors without actually execute a program.

2.1.3.3.6.4 Inspect
2.1.3.3.6.5 Profiling

Count time is useful.

And real time: the time program cost to execute to finish. User time: the cpu time a program used. System time: the system cost during the program executes.

2.1.3.3.6.5.1 CPU profiling

Tracing: record all information during program executes. Sampling: regularly inspect program.

Thus tracing will result to performance decrease.

Liner profiler: cost for each line's execute.

2.1.3.3.6.5.2 Memory profiling
2.1.3.3.6.5.2.1 Analysis

Perf Perf :

  • list:
  • stat:
  • record:
  • report:
2.1.3.3.6.5.3 Visualize
  • Flame Graph
  • Call Graph
2.1.3.3.6.5.4 Resource profiling
2.1.3.3.7 Meta programming

DSL everywhere.

2.1.3.3.7.1 Build systems
  • Describe how to build
  • Encode Rules

Make: GNU Make, BSD Make, NMake

2.1.3.3.7.2 Make Rules
2.1.3.3.7.3 Repositories
2.1.3.3.7.4 Version

Semantic Version

2.1.3.3.7.5 CI/CD

Continuously Integration & Continuously Distribution

  • Recipe:
  • Behaviour when something happened
2.1.3.3.7.6 Auto Testing
  • Test Suit: a large collection of tests
  • Unit Test:
  • Integration Test:
  • Regression Test:
  • Mocking:
2.1.3.3.8 Security

Password need to be hight information entropy

2.1.3.3.8.1 Hash functions
  • non-invertible
  • collision resistant

Hash in git, need no conflict, compared to old hash method.

2.1.3.3.8.2 Key derivation functions
2.1.3.3.8.3 Symmetric key cryptography
  • keygen() -> key
  • encrypt(plain text, key) -> cipher text
  • decrypt(cipher text, key) -> plain text
2.1.3.3.8.4 Asymmetric key cryptography
  • keygen() -> (public key, private key)

  • encrypt(P, public key) -> C

  • decrypt(C, private key) -> P

  • sign(msg, private key) -> signature

  • verify(msg, sig, public key) -> ok?

2.1.3.3.9 Misc
2.1.3.3.9.1 Change keyboard mapping

Remap fn keys, caplock, shortcuts.

2.1.3.3.9.2 daemons

run commands in background

2.1.3.3.9.3 fuse
2.1.3.3.9.4 background
2.1.3.3.9.5 API
2.1.3.3.9.6 WM
2.1.3.3.9.7 MD
2.1.3.3.9.8 Boot
2.1.3.3.9.9 Docker
2.1.3.3.9.10 Interactive Notepad Programming
2.1.3.3.9.11 GitHub
2.1.3.4  From The C Programming Language To Theoretical Computer Science (Section −1) [S-1]
2.1.3.4.1 From The C Programming Language to Theoretical Computer Science
2.1.3.4.1.1 Section −1: Linux and Tool Chain
2.1.3.4.2 Contents
From The C Programming Language to Theoretical Computer Science ⁠1
Section −1: Linux and Tool Chain ⁠1
Intro ⁠1
Virtualization ⁠1
Virtual Machine ⁠1
Full Virtualization & Semi Virtualization ⁠1
Hardware Virtualization Support ⁠1
Virtual Box ⁠1
Operating System ⁠1
Bootloader ⁠1
Bootstrap ⁠1
Kernel ⁠1
GRUB, Systemd-boot ⁠1
GNU/ Linux, Minix, GNU/Hurd, *BSD, Illumos, Drawin, …: *nix (Unix-Like) ⁠1
Distribution ⁠1
Debian, Ubuntu, RHEL, Arch, NixOS, Slackware ⁠1
Root Distribution ⁠1
Why Ubuntu ⁠1
Live CD ⁠1
Bootstrap ⁠1
Installation ⁠1
Partition ⁠1
Partition Table ⁠1
File System ⁠1
Log, CoW, Snapshot ⁠1
User & Group ⁠1
Privilege ⁠1
Root user ⁠1
Sudo ⁠1
Terminal, Shell, Terminal Simulator & tty/n tty/n ⁠1
FHS ⁠1
home ⁠1
root ⁠1
bin & sbin ⁠1
usr ⁠1
User Commands ⁠1
Sudoer Commands ⁠1
commands, parameters, augments ⁠1
shell tricks, pipeline, i/o redirection ⁠1
Forground & Background ⁠1
Process Suspend ⁠1
signal ⁠1
Terminal Reuse ⁠1
Aliasing ⁠1
SSH ⁠1
Shell substitution ⁠1
Command line Editor ⁠1
Version Control ⁠1
Build System ⁠1
2.1.3.4.2.1 Intro
2.1.3.4.2.2 Virtualization
2.1.3.4.2.2.1 Virtual Machine
2.1.3.4.2.2.2 Full Virtualization & Semi Virtualization

Full Virtualization:

全虚拟化通过软件模拟硬件的架构, 和运行, 效率低

  1. qemu
  2. bochs

Semi Virtualization:

半虚拟化有硬件提供辅助, 虚拟化运行的指令可以直接发到硬件, 由硬件直接运行, 需要硬件支持, 并且无法跨硬件平台模拟

  1. KVM
  2. ZEN
2.1.3.4.2.2.3 Hardware Virtualization Support
2.1.3.4.2.2.4 Virtual Box
2.1.3.4.2.3 Operating System
2.1.3.4.2.3.1 Bootloader
2.1.3.4.2.3.2 Bootstrap
2.1.3.4.2.3.3 Kernel
2.1.3.4.2.3.4 GRUB, Systemd-boot
2.1.3.4.2.4 GNU/ Linux, Minix, GNU/Hurd, *BSD, Illumos, Drawin, …: *nix (Unix-Like)
2.1.3.4.2.4.1 Distribution
2.1.3.4.2.4.2 Debian, Ubuntu, RHEL, Arch, NixOS, Slackware
2.1.3.4.2.4.3 Root Distribution
2.1.3.4.2.4.4 Why Ubuntu
2.1.3.4.2.4.5 Live CD
2.1.3.4.2.4.6 Bootstrap
2.1.3.4.2.4.7 Installation
2.1.3.4.2.4.8 Partition
2.1.3.4.2.4.9 Partition Table
2.1.3.4.2.4.10 File System
2.1.3.4.2.4.11 Log, CoW, Snapshot
2.1.3.4.2.4.12 User & Group
2.1.3.4.2.4.13 Privilege
2.1.3.4.2.4.14 Root user
2.1.3.4.2.4.15 Sudo
2.1.3.4.2.4.16 Terminal, Shell, Terminal Simulator & tty/n tty/n
2.1.3.4.2.4.17 FHS
2.1.3.4.2.4.18 home
2.1.3.4.2.4.19 root
2.1.3.4.2.4.20 bin & sbin
2.1.3.4.2.4.21 usr
2.1.3.4.2.4.22 User Commands
2.1.3.4.2.4.23 Sudoer Commands
2.1.3.4.2.4.24 commands, parameters, augments
2.1.3.4.2.4.25 shell tricks, pipeline, i/o redirection
2.1.3.4.2.4.26 Forground & Background
2.1.3.4.2.4.27 Process Suspend
2.1.3.4.2.4.28 signal
2.1.3.4.2.4.29 Terminal Reuse
2.1.3.4.2.4.30 Aliasing
2.1.3.4.2.4.31 SSH
2.1.3.4.2.4.32 Shell substitution
2.1.3.4.2.4.33 Command line Editor
2.1.3.4.2.4.34 Version Control
2.1.3.4.2.4.35 Build System
2.1.3.5  D-Flat Compiler Frameworks [Compiler]
2.1.3.6  D-Flat System Main Description [D_Flat]
2.1.3.7  D-Flat Editor & IDE [Editor&IDE]
2.1.3.7.1 D-Flat Editor
2.1.3.7.2 Configuration Language
2.1.3.7.3 Plugin
2.1.3.7.4 Extension
2.1.3.7.5 IDE Layer
2.1.3.8  Lambda Calculator Simulator (SKI) for Project D-Flat [Lambda]
2.1.3.8.1 Lambda Calculator Virtual Machine Design
2.1.3.9  Lilies: S-Expression Language Build Upon D-Flat System [Lilies]
2.1.3.9.1 Abstract 摘要

Lilies (short for "List Interpret Language in s-Expression Syntax") is a dialect of LISt-Processing language.

This report describes the design and implementation of Lilies language.

Lilies is designed to be extremely simple and portable. With a small set of kernel, clear semantics, and a powerful macro system, Lilies makes it easy to combine expressions into higher-level constructs.

The language is designed to be extensible and flexible: its hygienic macro system lets users defines new syntax and corresponding semantics safely. A set of built-in special forms and macros is provided to simplify common programming tasks; these act as syntactic sugar over the core language.

Lilies aims to be efficient practical and safe. With a strong type system, an ownership model forces memory safety and, and compile-time evaluation capabilities, the language can force programmers to write efficient and safe code. Lilies can express complex algorithms and data structures in functional, imperative, declarative and message passing styles or so.

The standard library for Lilies are divided into two parts: a core language library that provides basic data types, syntaxes, and contracts; and a compile-time library that supplies macros and compile-time functions.

The language Lilies should be implemented with both an interpreter and a compiler. Together with REPL, Development Environment, Debugger, and other tools to provide a complete programming experience.

The language has a full type system: primitive types, composite types, generic types, and user-defined types, plus type annotations and type inference. The type system should support type inference, type checking, and type casting. Providing with interface, trait, and generic programming capabilities.

Lilies should include a complete module system (module definition, import / export, and versioning) that support dependency management and module resolution.

It should also include a complete exception handling system (exception definition, exception throwing and catching, and exception propagation) with custom exception types definition and hierarchies.

The language should support continuation system (definition, capture and invocation), including continuation system should support first-class continuations and continuation passing style.

Finally, Lilies should provide a comprehensive metaprogramming system (macros, compile-time functions, and code generation) that support hygienic macros and compile-time evaluation.

Lilies(全称 “List Interpret Language in s-Expression Syntax”)是一种列表处理语言方言。本报告描述了 Lilies 语言的设计与实现。

Lilies 的设计目标是极其简单且可移植。通过一个精简的内核、清晰的语义以及强大的宏系统,Lilies 能够方便地将表达式组合成更高层次的构造。该语言强调可扩展性与灵活性:其卫生宏(hygienic macro)系统使用户可以安全地向语言中添加新的语法及相应语义。语言提供了一组内建的特殊形式和宏以简化常见编程任务,这些可视为语法糖。

Lilies 追求高效、实用与安全。借助强类型系统、强制内存安全的所有权模型以及编译时求值能力,语言能够帮助程序员编写高效且安全的代码。Lilies 可用于以函数式、命令式或消息传递等风格表达复杂算法与数据结构。

Lilies 的标准库分为两类:一类是核心语言库,提供基本数据类型、语法与契约;另一类是编译时库,提供宏和编译时函数。

Lilies 应当同时实现解释器和编译器,并配套提供 REPL、开发环境、调试器及其它工具,以提供完整的开发体验。

该语言应具备完整的类型系统,包括基本类型、复合类型、泛型类型和用户自定义类型,并支持类型注解与类型推断。类型系统应支持类型推断、类型检查与类型转换,并提供接口、特征(trait)和泛型编程能力。

Lilies 应设计完整的模块系统,包含模块定义、导入导出与版本管理,模块系统应支持依赖管理与模块解析。语言还应设计完整的异常处理系统,包含异常定义、抛出与捕获以及异常传播,并支持自定义异常类型与异常层次结构。

Lilies 应设计完整的续体/延续(continuation)系统,包含续体的定义、捕获与调用,支持一等续体和续体传递风格(continuation-passing style)。

最后,Lilies 应设计完善的元编程系统,包含宏、编译时函数与代码生成,元编程系统应支持卫生宏与编译时求值。

2.1.3.9.2 Introduction 引言

A single generic programming language cannot satisfy all needs of all programmers. Therefore reducing language complexity is important: keep a small core and give users the ability to extend the language.

A simple, clear expression syntax and unlimited composability of expressions make it possible to construct a practical and effective programming language.

Lilies draws many design ideas from earlier Lisps and Scheme dialects: first-class functions (procedures), lexical scope, continuations, and macros. Syntax objects can be manipulated programmatically. In contrast to those languages, Lilies is designed with a strong static type system.

Lilies is intended to be a native language that can compete with C, or a compilation target upon which other languages can be implemented. In the D-Flat system, Marguerite is implemented on top of Lilies.

All symbols in Lilies share a single namespace, whether they are variables, functions, classes, interfaces, modules, or other entities. In each expression, operators and operands are distinguished by their positions.

Unlike some Lisp dialects that use function application to implement loops, Lilies provides full functional loop constructs as built-in syntax extensions (outside the minimal core). Tail-call optimization is provided to ensure loops are efficient.

Object-oriented classes are supported. Everything in Lilies is an object, including functions, classes, interfaces, and modules. Classes can be computed at compile time, enabling powerful metaprogramming and generic programming. With traits and interfaces, Lilies supports polymorphism and code reuse. Contracts enable design-by-contract programming. The language also provides full compile-time type checking and type inference.

Modules are first-class citizens: they can be defined, imported, and exported.

The language can capture continuations — the "rest of the computation" at any point — allowing advanced control-flow constructs to be built on top. When a continuation is captured it is saved as an "escape procedure", a function that can be invoked later to resume execution at the capture point. Delimited continuations are also supported.

For higher-level control, algebraic effects and handlers are supported. Although effect handlers can be implemented with continuations, Lilies treats them as a distinct construct with dedicated syntax and semantics.

A full functional exception system is provided. Exceptions can be defined, raised, caught, propagated, and in some cases resumed, allowing flexible handling.

There are several ways to extend the language; macros are the most powerful. Lilies’ macros are hygienic and let users parse ASTs, access or drop contextual information, and generate new syntax trees. Macro-generated syntax can be hygienic or intentionally unhygienic as needed. Syntax objects are first-class, permitting parsing, manipulation, and generation of syntax trees, especially within macros. Another extension mechanism is symbol generation: new expressions can be generated at compile time with specific symbols or attributes (similar in spirit to KSP for Kotlin or Roslyn for C#).

The macro system must ensure that macros can provide the same compile-time information as built-in syntax so the compiler can produce full error diagnostics.

The language is built on an attribute grammar so that each syntax node can carry attributes used to store type information, scope information, and other metadata.

Except for define define , no construct may directly create new bindings in the current scope. The let let and let: let: families create bindings through closure capture. The language is designed to be referentially transparent: variables, functions, classes, modules, and macros should be defined before use.

These features make Lilies a powerful tool for building complex software systems and a fertile platform for research in programming theory.

单一的通用编程语言无法满足所有程序员的所有需求。因此,简化语言复杂性很重要:保留最小核心,并赋予用户扩展语言的能力。

简单清晰的表达式语法以及表达式的无限可组合性,使得构建实用且高效的编程语言成为可能。

Lilies 在设计上借鉴了早期的 Lisp 和 Scheme 方言的许多思想:一等函数(过程)、词法作用域、continuations(续延/延续)和宏。语法对象可以以编程方式进行操作。与这些语言不同,Lilies 设计为具有强静态类型系统的语言。

Lilies 的目标是成为一门可与 C 竞争的本地语言,或作为其他语言的编译目标。在 D-Flat 系统中,Marguerite 就是建立在 Lilies 之上的。

在 Lilies 中所有符号共享同一个命名空间,不论它们是变量、函数、类、接口、模块或其他实体。在每个表达式中,运算符和操作数由其位置来区分。

不同于某些 Lisp 方言通过函数调用实现循环的做法,Lilies 提供完整的函数式循环构造,作为内建的语法扩展(而非核心)。同时提供尾调用优化以保证循环的高效性。

支持面向对象的类。Lilies 中的一切都是对象,包括函数、类、接口和模块。类可以在编译期计算,从而支持强大的元编程能力和泛型编程。通过 trait(特征)和接口,Lilies 支持多态和代码重用。通过契约(contracts),支持契约式设计。语言同时提供完整的编译时类型检查和类型推断能力。

模块是第一类公民:可以定义、导入和导出。

语言能够捕获任意时刻的 continuation(程序剩余计算),从而可以构建高级控制流构造。捕获的 continuation 会被保存为“逃逸过程”(escape procedure),这是一个可以稍后调用以从捕获点恢复计算的函数。Lilies 也支持定界(delimited)continuation。

为了实现更高层次的控制,Lilies 也支持代数效果(algebraic effects)及其处理器。虽然效果处理器可以用 continuation 来实现,但 Lilies 将它们作为独立的构造来提供,以便拥有更好的语法和语义支持。

提供了完整的函数式异常处理系统。异常可以定义、抛出、捕获和传播,并在某些情况下支持恢复,从而灵活地处理错误。

有多种方式扩展语言,其中宏是最强大的。Lilies 的宏是“卫生”的(hygienic),并允许用户解析抽象语法树(AST)、获取或丢弃上下文信息、生成新的语法树。宏生成的语法可以根据需要是卫生的或有意非卫生的。语法对象在 Lilies 中是一等公民,便于在宏中解析、操作和生成语法树。另一种扩展方式是符号生成:可以在编译时根据给定的符号或属性生成新的表达式(类似于 Kotlin 的 KSP 或 C# 的 Roslyn)。

宏系统必须确保程序中使用的宏在编译时能提供与内建语法相同的信息,以便编译器能给出完整的错误诊断。

该语言基于属性文法构建,每个语法节点都可以关联属性,用于存储类型信息、作用域信息或其他元数据。

define define 外,任何构造都不能直接在当前作用域创建新的绑定。 let let let: let: 系列通过闭包捕获来创建绑定。因此语言被设计为引用透明:变量、函数、类、模块和宏应在使用前定义。

这些特性使 Lilies 成为构建复杂软件系统的强大工具,同时也是计算机程序理论研究的良好平台。

2.1.3.9.2.1 Background 背景

The lilies language is designed and implemented as part of the D-Flat system. For creating a practical programming language and a powerful tool that can be used to implement other languages.

In the design of Lilies, many ideas and concepts from other programming languages are borrowed.

2.1.3.9.2.2 Guiding Principle 指导方略

The design of Lilies is guided by several principles:

  1. Simplicity: The language should be simple and easy to learn, with a small set of core constructs and clear semantics.
  2. Portability: The language should be portable, able to run on a variety of platforms and architectures.
  3. Extensibility: The language should be extensible, allowing users to define new syntax and without modifying the core language.
  4. Orthogonality: The language should be orthogonal, with constructs that can be combined in a variety of ways without unexpected interactions.
  5. Uniformity: The language should be uniform, with consistent syntax and semantics across different constructs; Source code should be able to be treated as data and vice versa.

For real world programming, the following principles are also important:

  1. Enable library creation and code reuse.
  2. Provide strong type system to catch errors at compile-time.
  3. Allowing for efficient code generation and execution.
  4. Support multiple programming paradigms, including functional, imperative, and declarative programming styles.
2.1.3.9.3 Overview 语言总览

本章用于描述语言的基本概念, 以帮助了解后续章节. 本章依据语法条目以帮助手册的方式被组织起来, 并非完整对于语言的描述. 在某些地方也不会完善和规范.

2.1.3.9.3.1 Variable, Slots & Fields 变量, 插槽与字段

Variables in Lilies are some space allocated to store values.

Slots are locations within objects that can hold values, named or not. In practice, slots are some space allocated within an object to store values.

Fields are similar to slots, but they are named and is used to store values that are associated with a specific object instance.

2.1.3.9.3.2 Type System 类型系统

Every value in Lilies has a type. Types are used to classify values and determine what operations can be performed on them.

It is able to define new types by combining existing types (structures) or inductively defining new types (recursive types).

Each type are individual, defined by its name, structure, and behavior. But types can also have hierarchical relationships with other types through inheritance and subtyping. A type can be a subtype of another type, if and only if it inherits from that type and implement all traits and interfaces the type implemented.

Supertype doesn't means that all values of the subtype can be treated as values of the supertype. The only guarantee is that when a constraint requires a value of the supertype, a value of the subtype can be used instead.

Every type must derive a default "empty" value, together with its corresponding type, which is used when a value of that type is required but not provided. Every type has its own type checking rules, which are used to determine whether a value is of that type or not. Thus empty values can be distinguished from other values of the same type.

2.1.3.9.3.2.1 Basic Types 基本类型

Primitive types for Lilies language include:

  • Numbers
  • Booleans
  • Characters
  • Strings
  • Symbols
  • Pairs
  • Vector
  • Tuples
  • Any
  • None
  • Ignore
  • Meta
  • Unit
2.1.3.9.3.2.1.1 Number Tower 数字类型层次

Numbers in Lilies are organized in a type hierarchy known as the "number tower". At the base of the tower is the most general type, Number Number , which encompasses all numeric types:

  • Number
  • Complex
  • Real
  • Rational
  • Integer
  • Unsigned Integer
  • Zero

Below Unsigned Integer Unsigned Integer , there are specific types for different sizes of integers:

  • (int 8) (int 8) or (uint 8) (uint 8)
  • (int 16) (int 16) or (uint 16) (uint 16)
  • (int 32) (int 32) or (uint 32) (uint 32)
  • (int 64) (int 64) or (uint 64) (uint 64)

Zero is a special type that represents the value zero. It can be used to construct other numeric types.

Default Empty type for numbers is Zero.

2.1.3.9.3.2.1.2 Booleans 布尔类型

Booleans in Lilies are represented by the type Boolean Boolean , which has two possible values: #True #True (true) and #False #False (false). The boolean type are organized in a type hierarchy:

  • Boolean

    • True
    • False

Default Empty type for booleans is False.

2.1.3.9.3.2.1.3 Characters 字符类型

Characters in Lilies are represented by the type Character Character , which represents a single Unicode character. Default Empty type for characters is the null character type EOF, which has the only instance #\EOF #\EOF .

2.1.3.9.3.2.1.4 Strings 字串类型

Strings in Lilies are represented by the type String String , which represents a sequence of objects, typically characters. Default Empty type for strings is the Empty type, for which the only instance is the empty string "" "" .

String are some serialized data, a continuous sequence of bytes. No matter it is encoded utf-8 ro raw bytes, even integers or complex objects.

In Lilies, there are different kinds of continuous data:

  • Strings, which is described here,
  • Vector, fixed-size sequence of same-type elements,
  • Tuple, fixed-size sequence of potentially different-type elements,
  • Array, variable-size sequence of same-type elements,
  • List, variable-size sequence of potentially different-type elements, as a linked list,
2.1.3.9.3.2.1.5 Symbols 符号类型

Symbols is a unique and immutable identifier used to represent names or labels in Lilies. Symbols have their own name, which is a string. Symbols are often used as keys in associative data structures, such as hash tables or dictionaries. Two symbols with the same name are considered equal.

Symbols are interned, meaning that there is only one instance of a symbol with a given name in the system. When a symbol is created, the system checks if a symbol with the same name already exists, and if so, returns the existing symbol instead of creating a new one.

Symbols has their own type, Symbol Symbol . None default empty type for symbols.

2.1.3.9.3.2.1.6 Pairs 对偶类型

Pairs in Lilies are represented by the type Pair Pair , which represents a ordered pair of values. Pairs is a type as primitive type but with generic type parameters, allowing for pairs of any two types of values.

Pairs that the second element contains another pair that has its second element being None are treated as lists. Which are linked lists constructed from pairs.

Default Empty type for pairs is the Pair::Empty Pair::Empty type, for which the only instance is the pair (None . None) (None . None) .

2.1.3.9.3.2.1.7 Vectors 向量类型

Vectors in Lilies are represented by the type Vector Vector , which represents a fixed-size sequence of values. Vectors is a type as primitive type but with two generic type parameters: the type of the elements and the size of the vector.

Default Empty type for vectors is the Vector::Empty Vector::Empty type, a vector type that has size of 0 and type of None. The only instance of this type is the empty vector #() #() .

2.1.3.9.3.2.1.8 Tuples 元组类型

Tuples in Lilies are represented by the type Tuple Tuple , which represents a fixed-size sequence of values of potentially different types. Tuples is a type as primitive type but with a variable number of generic type parameters, each representing the type of an element in the tuple.

Default Empty type for tuples is the Tuple::Empty Tuple::Empty type, a tuple type that has no elements. The only instance of this type is the empty tuple #<> #<> .

2.1.3.9.3.2.1.9 Any 任意类型

Any type is the supertype of all types in Lilies. Every value in Lilies is of type Any. But Any type cannot hold any value directly nor be instantiated.

In practice, Any type is used as a placeholder type when the specific type of a value is not known or not important.

Any type has no default empty type.

2.1.3.9.3.2.1.10 None 空类型

None type is the subtype of all types in Lilies. None represents the absence of a value. None type can hold only one value, which is also called None.

In practice, None type is used to indicate that a value is missing or not applicable.

None type is the default empty type for Symbols, and itself.

2.1.3.9.3.2.1.11 Ignore 忽略类型

Ignore type is a special type that indicates that a value should be ignored. Values of Ignore type are not stored or used in any way. Ignore type is often used in situations where a value is required by the syntax or semantics of the language, but the value itself is not important. Ignore type has only one value, also a variable, which is also called Ignore.

In practice, Ignore type is used to indicate that a value should be ignored or discarded.

Ignore type is the default empty type for itself.

2.1.3.9.3.2.1.12 Meta 元类型

Meta type is the type of types in Lilies. Meta type may be structure description or type generator.

Meta type always promises to be non-empty, thus has no default empty type.

2.1.3.9.3.2.1.13 Unit 单元类型

Every structure that has no fields is considered as Unit type. Thus unit type is not a primitive type, but a special structure type.

Unit types cannot have instances, thus has no default empty type.

2.1.3.9.3.2.2 Syntax Object 语法类型

Syntax objects in Lilies are representations of code as data structures, together with contextual information such as scope and source location. Syntax objects are so special that they should be built-in and given first-class status in the language.

2.1.3.9.3.2.3 Closure Type 闭包类型

Functions in Lilies represents a mapping from a set of input values (parameters) to a set of output values (return values). And can capture the lexical scope in which they are defined, forming closures.

Closure type constructs the type of a function, including the types of its parameters and return values.

2.1.3.9.3.2.4 Composite Types 复合类型

There are composite type constructors provided in Lilies language, including:

  • product types

    • tuples
    • pairs
    • vectors
    • lists
    • arrays
    • maps
    • structures
  • sum types

    • tagged unions
  • recursive types

    • linked lists
  • intersection types

    • traits
    • interfaces

Some of them are built-in primitive types with generic type parameters, such as tuple, pair, and vector. Others are constructed through type definition syntax, such as structures, unions, and recursive types.

Use type type to define new recursive types by creating type generators that can produce types based on type parameters. The type described by type type will not create a new type indeed, rather a new type checker that can check whether a value is of the described type or not will be implemented.

2.1.3.9.3.2.5 Enum Types 枚举类型

Enumeration types in Lilies are special form of tagged unions, which represent a set of named values.

2.1.3.9.3.2.6 Internal Types 内部类型

Internal types in Lilies are special types that are used by the language implementation itself, and are not intended to be used directly by programmers. The only exception is the Syntax Object type, which is used in macros and syntax manipulation.

2.1.3.9.3.2.7 Generic 泛型类型

There exists different kinds of generic type implements in practice, including:

  • monomorphization
  • type erasure
  • dictionary passing / witness table
  • reified generics
  • boxing / universal representation
  • compile-time type computation / metaprogramming
  • canonicalization

In the Lilies language, compile-time type computation is main approach used to implement generics.

2.1.3.9.3.2.8 Traits 特征与接口

Traits are a way to define shared behavior that can be implemented by multiple types. Furthermore, traits can be composed together to create new traits.

Traits can be used to constraint generic types, ensuring that a type parameter implements a specific set of behaviors. Traits can be used to define dynamic dispatch rule, allowing methods to be called on values of different types that implement the same trait.

2.1.3.9.3.2.9 Type Dispatch 类型分派

When a value is used in an expression, the type of the value is determined through type dispatch.

2.1.3.9.3.2.10 Auto Type Detection 自动类型检测

When defining variables, functions, classes, and so on, if the type is not explicitly specified, the type will be inferred from the context.

2.1.3.9.3.3 Object System 对象系统

Object is the core concept of Lilies language. Though types in Lilies can not inherit from other types in the traditional sense, objects system for Lilies still provides other way to archive polymorphism and code reuse.

The class defines only the structure of a object, but methods are implemented separately. With traits, it becomes possible to share method implementations across different classes and extend object behaviour outside the class definition.

A concept of generic function is borrowed from CLOS and it is renamed to interface interface in Lilies. With interface, user-defined methods can be called in a uniform way as traditional functions. Another benefit is that interfaces are all static dispatched by default, making them more efficient than traditional methods.

implement implement syntax will create methods for a specific class, and assign the method to corresponding class.

There are still some special concept borrowed form traditional OOP languages:

  • Fields: named slots associated with a specific object instance.
  • Properties: named slots that used for value fetching only.

All objects in lilies are referenced by value by default. To have a object referenced by reference, use type wrappers.

Type wrapper can be ownership, garbage collected or reference counted pointer.

This part describes the object system, definition of classes, and their possible literals.

2.1.3.9.3.3.1 Primitive Object 原始对象

Primitive objects in Lilies are build upon primitive types. Some of primitive objects can be written in literal syntax.

Primitive objects cannot be split into smaller parts.

For which, there are:

  • Integer Object

    • [1-9][0-9]* [1-9][0-9]*
    • 0b[01]+ 0b[01]+
    • 0o[0-7]+ 0o[0-7]+
    • 0x[0-9a-fA-F]+ 0x[0-9a-fA-F]+
  • Float Object

    • [0-9]+\.[0-9]*([eE][+-]?[0-9]+)? [0-9]+\.[0-9]*([eE][+-]?[0-9]+)?
    • \.[0-9]+([eE][+-]?[0-9]+)? \.[0-9]+([eE][+-]?[0-9]+)?
    • [0-9]+[eE][+-]?[0-9]+ [0-9]+[eE][+-]?[0-9]+
  • Character Object

    • #\descrition #\descrition
    • #\'character #\'character
    • #\uXXXX #\uXXXX
  • String Object

    • "string content" "string content"
    • #f"string content with escapes" #f"string content with escapes"
    • #b"raw string content" #b"raw string content"
  • Symbol Object

    • 'symbol-name 'symbol-name
  • Boolean Object

    • #True #True
    • #False #False
  • Pair Object

    • '(first . second) '(first . second)

Above, quote syntax is used to create literal syntax for symbols and pairs.

2.1.3.9.3.3.2 Classes, Fields, Properties & Traits 类, 字段, 属性与特征

Classes are user defined types for structure types.

A classes can declare it inherits from a parent class explicitly, but that will not change the class structure. If a class is declared to have a parent class, it must implement all traits that its parent class implements.

Fields are named slots associated with a specific object instance. Each field has its own name and type. In class definition, fields are declared with define define syntax.

Properties are named slots that used for value fetching only. The method to declare a field as property can be various, Use setter and getter methods is one of the common way. However, it is encouraged to manually assign accessibility attributes to fields to control read and write access right for internal, class internal, package internal, and public access levels.

Traits are used to define shared behavior that can be implemented by multiple classes. Traits can be implemented manually for a class, and user defined traits can be used to extend class behavior for a library defined class.

2.1.3.9.3.3.2.1 Definition of Classes 类的定义

Define a new class with class class syntax. E.g., to define a new class Point Point with two fields x x and y y of type Integer Integer :

(define Point
  (class
    (define x Integer)
    (define y Integer))))
(define Point
  (class
    (define x Integer)
    (define y Integer))))

Here, define define syntax used to declare Point as the class we defined using class class syntax. And define define syntax inside the class body used to declare fields x x and y y of type Integer Integer . #:self this #:self this declares that within the class body, this this refers to the current instance of the class. Symbols starts with #: #: are keywords annotations, for which pass some attributes when function or macro application. Another special keyword annotations are start with #& #& , for passing some attributes when function or macro definition. Most generic annotations are written as #@[attributes] #@[attributes] , and is assigned to expressions. Later there will be a chapter describing all these annotations in detail.

Full syntax of class definition is described as:

class-definition ::=
'(' 'class' <inherits>
   { <fields> } ')'

<inherits>       => '(' { <class> } ')'
<fields>         =>
'(' ':fields' { <deffield> } ')'

<deffield>       =>
'(' 'define' <name> [ '#:type' ] <type> [ <init> ] ')'
class-definition ::=
'(' 'class' <inherits>
   { <fields> } ')'

<inherits>       => '(' { <class> } ')'
<fields>         =>
'(' ':fields' { <deffield> } ')'

<deffield>       =>
'(' 'define' <name> [ '#:type' ] <type> [ <init> ] ')'

Inherits clause declares the super classes of the class being defined. Self clause declares the symbol that refers to the current instance of the class within the class body. Type clause declares the type of the class being defined.

With annotations, the accessibility of fields can be controlled: E.g.,

(define Point
  (class
    #@[accessibility x (read :public) (write :private)]
    #@[accessibility y (read :public) (write :private)]
    (define x Integer)
    (define y Integer))))
(define Point
  (class
    #@[accessibility x (read :public) (write :private)]
    #@[accessibility y (read :public) (write :private)]
    (define x Integer)
    (define y Integer))))

To define filed to be variable, wrap type with variable variable .

2.1.3.9.3.3.2.2 Definition of Traits 特征的定义

Define a new trait with trait trait syntax. E.g., to define a new trait Drawable Drawable with a method draw draw :

(define Drawable
  (trait
    #:self self
    (define draw (function (self)))))
(define Drawable
  (trait
    #:self self
    (define draw (function (self)))))
2.1.3.9.3.3.2.3 Method and Trait Implementation 方法与特征实现

Both Methods and Traits are implemented with implement implement syntax.

implement implement unwraps namespace of a class, and then methods defined within the body are assigned to the class function table. Furthermore, traits can unwrap namespace of a object, and then anything inside will only extend the object behavior.

(implement Point (Drawable)
  #:self self
  #:Type Self
  (define draw
    (lambda (self)
      #:returns (None)
      (print f"x: {(field self 'x)}; y: {(field self 'y)}"))))
(implement Point (Drawable)
  #:self self
  #:Type Self
  (define draw
    (lambda (self)
      #:returns (None)
      (print f"x: {(field self 'x)}; y: {(field self 'y)}"))))
2.1.3.9.3.3.2.4 Generic Function & Interface 泛义函数与接口
2.1.3.9.3.3.2.5 Method Dispatch 方法分派

When a method is called on an object, the method to be executed is determined through method dispatch.

2.1.3.9.3.3.2.5.1 Dynamic Dispatch 动态分派
((method object 'method-name') ...args)
;; or
({method-name object} ...args) ; for short
((method object 'method-name') ...args)
;; or
({method-name object} ...args) ; for short
2.1.3.9.3.3.2.5.2 Static Dispatch 静态分派
((method Class 'method-name') ...args)
;; or
({method-name Class} ...args) ; for short
((method Class 'method-name') ...args)
;; or
({method-name Class} ...args) ; for short
2.1.3.9.3.3.2.5.3 Method Access 语法糖方法调用
2.1.3.9.3.3.2.5.4 Invoke 调用
2.1.3.9.3.3.2.6 Field & Property Access 字段与属性访问
2.1.3.9.3.3.2.7 Traits Shadowing 特征遮蔽
2.1.3.9.3.4 Expression
2.1.3.9.3.5 Apply & Evaluation
  1. Apply & Evaluation

    1. Value Pass
    2. Reference Pass

      1. Ownership transaction
      2. Move
      3. Brought
2.1.3.9.3.6 Variable, Binding & Reference
  • Variable, Definition & Binding

    • Dynamic Scope
    • Lexical Scope
    • define define
    • let let & let: let: family
    • Dynamic In Lexical Scope
  • Form
  • Assignment
2.1.3.9.3.7 Procedure, Function & Method
  • Functions

    • Parameters
    • Rest Parameters
    • Parameter Stack
    • Return Values
    • Multiple Values Returning
    • Function Call
    • Multiple Value for Function Call
2.1.3.9.3.8 Name Space, Lexical Scope, Dynamic Scope, Closure
2.1.3.9.3.9 Generics
  1. Generics: Template

    1. Generic Macro
2.1.3.9.3.10 Macro
  1. Macro

    1. History: Compile-time calculation
    2. History: C-Style Macro
    3. History: defmacro defmacro
    4. Procedure Macro
    5. Hygiene for the Unhygienic Macro
  2. Syntax Rules

    1. History: Hygiene Macro
    2. Syntax Object
2.1.3.9.3.11 Symbol Generation
2.1.3.9.3.11.1 Expression Tree
2.1.3.9.3.12 Memory Management
  1. Pointer

    1. Reference Count
    2. Unique Ownership
    3. Raw Pointer
    4. Address
    5. Virtual Method Table: How dynamic dispatch implemented
  2. Ownership
  3. Garbage Collection
  4. Allocation

    1. alloc:stack alloc:stack : Object Allocated in Stack
    2. alloc:heap alloc:heap : Object Allocated in Heap
    3. new new : Object creation
  5. Auto Life-cycle Detection
2.1.3.9.3.13 Continuations
2.1.3.9.3.14 Exception Handling
  1. Condition System
2.1.3.9.3.15 Module & Library
2.1.3.9.3.16 Top-Level
2.1.3.9.4
2.1.3.10  Margarita: Language as extension for Lilies in M-Expression [Margarita]
2.1.3.10.1 Abstract
2.1.3.10.2 Introduction
2.1.3.10.2.1 Background
2.1.3.10.2.2 Guiding Principle
2.1.3.11  STD: Standard Library For D-Flat System [StandardLibrary]
2.1.3.12  Turing Machine Simulator (R-M) for Project D-Flat [Turing]
2.1.3.12.1 Turing Machine Virtual Machine Design

The virtual machine works just similar to real CPU-memory.

The virtual machine has following properties:

  • 32-bit instruction width
  • 64-bit register size
  • 32 general-purposed registers
  • 32 special-purposed registers

The virtual machine adopt a new designed instruction set.

2.1.3.12.2  Architecture Overview [Overview]

The virtual machine works in a register-memory architecture.

File:                                                 Memory:
+--------------------------------------+      +-------------------------------+
| Archive                              |      |  +--------------------++++    |
| +----------+    +---------------+++  |      |  | Global Data Stack  ||||    |
| | Global   | +->| Function Unit |||| |      |  |--------------------++++    |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | Text Vector           |    |
| | | D |||  | |  | | T ||| |D||| |||| |      |  +-----------------------+    |
| | | a |||  | |  | | e ||| |a||| ||||==========>| Function Unit Vector  |<-+ |
| | | t |||  | |  | | x ||| |t||| |||| |      |  | +-------------------++|  | |
| | | a |||  | |  | | t ||| |a||| |||| |      |  | | +---------------+ |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | Data Vector   | |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | +---------+++ | |||  | |
| | | E |||  | |  | | C ||| |C||| |||| |      |  | | | | Literal ||| | |||  | |
| | | n |||  | |  | | o ||| |l||| |||| |      |  | | | +---------+++ | |||  | |
| | | t -------+  | | s ||| |o||| |||| |      |  | | | | capture ||| | |||  | |
| | | e |||  |    | | t ||| |s||| |||| | Load |  | | | +---------+++ | |||  | |
| | | r |||  |    | | a ||| |u||| |||| | ===> |  | | | | data    ||| | |||  | |
| | | y |||  |    | | n ||| |r||| |||| |      |  | | | +---------+++ | |||  | |
| | +--+++   |    | | t ||| |e||| |||| |      |  | | +---------------+ |||  | |
| |          |    | +--+++  +-++  |||| |      |  | | | Text          | |||  | |
| |          |    |               |||| |      |  | | +---------------+ |||  | |
| +----------+    +---------------+++  |      |  | +------------------+++|  | |
|                                      |      |  +---------------------+++  | |
+--------------------------------------+      |  | Execution Stack     |||  | |
                                              |  | +---------+++       |||  | |
                                              |  | | Pointer ---------------+ |
                                              |  | +---------+++       |||    |
                                              |  +---------------------+++    |
                                              |  | Register Records    |||    |
                                              |  +---------------------+++    |
                                              +-------------------------------+
File:                                                 Memory:
+--------------------------------------+      +-------------------------------+
| Archive                              |      |  +--------------------++++    |
| +----------+    +---------------+++  |      |  | Global Data Stack  ||||    |
| | Global   | +->| Function Unit |||| |      |  |--------------------++++    |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | Text Vector           |    |
| | | D |||  | |  | | T ||| |D||| |||| |      |  +-----------------------+    |
| | | a |||  | |  | | e ||| |a||| ||||==========>| Function Unit Vector  |<-+ |
| | | t |||  | |  | | x ||| |t||| |||| |      |  | +-------------------++|  | |
| | | a |||  | |  | | t ||| |a||| |||| |      |  | | +---------------+ |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | Data Vector   | |||  | |
| | +--+++   | |  | +--+++  +-++  |||| |      |  | | | +---------+++ | |||  | |
| | | E |||  | |  | | C ||| |C||| |||| |      |  | | | | Literal ||| | |||  | |
| | | n |||  | |  | | o ||| |l||| |||| |      |  | | | +---------+++ | |||  | |
| | | t -------+  | | s ||| |o||| |||| |      |  | | | | capture ||| | |||  | |
| | | e |||  |    | | t ||| |s||| |||| | Load |  | | | +---------+++ | |||  | |
| | | r |||  |    | | a ||| |u||| |||| | ===> |  | | | | data    ||| | |||  | |
| | | y |||  |    | | n ||| |r||| |||| |      |  | | | +---------+++ | |||  | |
| | +--+++   |    | | t ||| |e||| |||| |      |  | | +---------------+ |||  | |
| |          |    | +--+++  +-++  |||| |      |  | | | Text          | |||  | |
| |          |    |               |||| |      |  | | +---------------+ |||  | |
| +----------+    +---------------+++  |      |  | +------------------+++|  | |
|                                      |      |  +---------------------+++  | |
+--------------------------------------+      |  | Execution Stack     |||  | |
                                              |  | +---------+++       |||  | |
                                              |  | | Pointer ---------------+ |
                                              |  | +---------+++       |||    |
                                              |  +---------------------+++    |
                                              |  | Register Records    |||    |
                                              |  +---------------------+++    |
                                              +-------------------------------+

In data status stored in file, the file includes following section:

  • Global Data section: storing global variable data
  • Global entry section: storing all entry points of data and function
  • Function Unit section: storing all function units Function Unit includes:

    • Text section: storing instructions to be executed
    • Data section: storing constants, variables used only in this function
    • Constant section: storing immediate value used in this function
    • Closure section: storing relocation information for captured variables

In memory, there are three segments:

  • Global Data Stack: every function will share the same global data stack, used for variable storage, argument passing, etc. Global Data Stack works like normal stack in x86_64 assembly. Global Data Stack stores global variable data at initial, and then construct function frames when function called. A Global Data Stack can be at most 4 GiB size. The register Reg#SS Reg#SS points to the current used data stack segment base. The global data stack may be duplicated and stored in new data stack segment, when continuous, fork, extremely large stack allocation invoked. And the Reg#SS Reg#SS will be updated to point to new data stack segment base. It is also possible to use the data stack duplicate for snapshot purpose.
  • Text Vector: Every function's text segment is loaded into text vector. Text Vector stores all text segments in the program.
  • Function Unit Vector: every function has its own function unit, including text segment and data segment Function Unit Vector stores all function units in the program. Function Unit Vector maps function index to function unit, a pointer points to corresponding function unit in function unit. If a Function unit is not be referenced by any pointer, the slot for the function unit is freed and can be reused. Function Unit includes:

    • Text Segment: A pointer points to text segment in text vector.
    • Data Segment: every function have its own data segment, storing literal data and captured data, which are pointers to global data stack.

      • Literal Section: literal are constant may used in function or for instruction parameter, not able to be embedded in instruction directly.
      • Capture Section: Captured data are pointers, points to global data stack or heap data. Every pointer must be pushed into capture section and deleted when the function does not hold it. If the parameter is a captured pointer, the pointer must be pushed into capture section. No pointer is allowed to be stored in global data stack except argument passing area.
      • Data Section: other data used in function.

    Literal Section is loaded from file into data segment directly. Capture Section is constructed when function unit constructed. Data section is loaded from file into data segment directly.

  • Execution Stack: every function call will push a pointer points to corresponding function unit in function unit vector into execution stack. Execution stack stores function call frame pointer. The Execution Stack can be duplicated and stored in new execution stack segment, when continuous, fork invoked. And the Reg#ES Reg#ES will be updated to point to new execution stack segment.
  • Register Records: The register records store all register values, and will change the value as instructions executed. Register records will be saved and restored with snapshot exception handling invoked.
2.1.3.12.3  Register [Register]

The register can be divided into two kinds:

  • General Purposed Registers
  • Special Purposed Registers

All registers are 64 bits length. And can be represented use 6 bits number.

General-purposed registers can be visited by user freely, and can be updated by any instruction. Change a general-purposed register will not affect any other register or virtual machine execution status.

Special-purposed registers reflects the execution status of virtual machine. The value of special-purposed registers may be changed by virtual machine automatically. Or changed by instructions automatically. The read-write ability below for each special-purposed register are suggested only.

It is not recommended to change special-purposed registers directly, though all special-purposed register can be read and write as general-purposed registers.

The name of registers start as " Reg# Reg# ", and following are its name, a number or a string.

General-purposed registers may have only numbers as their name. For example: Reg#0 Reg#0 , Reg#1 Reg#1 , … There are only 32 general-purposed registers available.

Special-purposed registers have their own name, and their own code (number):

  • Result discarding used:

    • Ignore: Reg#Ign Reg#Ign , code 0x3f 0x3f , any value move into will be ignored.
  • Arithmetic computation, Result used:

    • Accumulator: Reg#A Reg#A , code 0x3e 0x3e , for result of ADD ADD , SUB SUB , MUL MUL , and DIV DIV , or return value
    • Counter: Reg#C Reg#C , code 0x3d 0x3d , for loop counts
    • Reminder: Reg#R Reg#R , code 0x3c 0x3c , for reminder of DIV DIV , or return value
  • Execution locating used:

    • Program Counter Reg#PC Reg#PC , code 0x3b 0x3b , for next instruction to be executed
    • Execution Stack Pointer Reg#EP Reg#EP , code 0x3a 0x3a , for current execution frame in execution stack
    • Execution Segment Reg#ES Reg#ES , code 0x39 0x39 , for execution stack segment
  • Stack locating used:

    • Stack Base Pointer Reg#BP Reg#BP , code 0x38 0x38 , for current stack frame base
    • Stack Top Pointer Reg#SP Reg#SP , code 0x37 0x37 , for current stack frame top
    • Stack Segment Reg#SS Reg#SS , code 0x36 0x36 , for stack segment
  • Condition reflecting used:

    • flags: Reg#FLAGS Reg#FLAGS , code 0x35 0x35 , for flags after instruction execution
    • tests: Reg#TESTS Reg#TESTS , code 0x34 0x34 , for test condition
2.1.3.12.3.1 General Purposed Registers: Reg#n Reg#n , n for number

General-purposed registers, from Reg#0 Reg#0 to Reg#1F Reg#1F (31). Can be visited by user freely.

2.1.3.12.3.2 Ignore: Reg#Ign Reg#Ign

Ignore all value move into.

Assign-only register, special-purposed register that can be visited by user. If user try to read value from it, always get zero.

2.1.3.12.3.3 Accumulator, Counter, Reminder: Reg#A Reg#A , Reg#C Reg#C , Reg#R Reg#R

Every Result of ADD ADD , SUB SUB , MUL MUL , and DIV DIV , may assigned into Reg#A Reg#A , accumulator.

Loop counts may relay on Reg#C Reg#C , counter. If LOOP LOOP instruction used, Reg#C Reg#C will be decremented by one automatically.

Reminder of DIV DIV may assigned into Reg#R Reg#R , reminder.

Read-Write register, special-purposed register that can be visited by user.

It is possible to not use stack to pass return value between functions, then Reg#A Reg#A and Reg#R Reg#R used for return value passing.

2.1.3.12.3.4 Program Counter, Execution Stack Pointer: Reg#PC Reg#PC , Reg#EP Reg#EP

Reg#PC Reg#PC points to next instruction to be executed in current function frame.

Reg#EP Reg#EP points to current execution frame in execution stack.

Also used for provide unwind information.

Read only register, not recommended to write directly. Write operation on them will affect the execution status of virtual machine. If value written into Reg#PC Reg#PC is within corresponding text segment of current function frame, the next instruction to be executed will be changed. If value written into Reg#EP Reg#EP is out of range, virtual machine will raise exception. If value written into Reg#EP Reg#EP is less than current top of execution stack, virtual machine will unwind execution stack to the target frame. If value written into Reg#EP Reg#EP is larger than current top of execution stack, virtual machine will raise exception.

2.1.3.12.3.5 Stack Segment, Stack Pointer, Base Pointer: Reg#SS Reg#SS , Reg#SP Reg#SP , Reg#BP Reg#BP

Reg#SS Reg#SS referencing Data Stack Segment, with offset 232 (P.S., 4 GiB). In most cases, Reg#SS Reg#SS won't be changed, since data stack works like normal stack, with a small size.

Reg#SP Reg#SP referencing Stack Top for current Function Frame.

Reg#BP Reg#BP referencing Stack Base for current Function Frame.

Reg#SP Reg#SP and Reg#BP Reg#BP won't less than 0, and won't larger than Segment length, though they are 64 bit (52 bit for addressing) pointer.

Read Write register, not recommended to write directly. The value of Reg#SP Reg#SP and Reg#BP Reg#BP will be changed automatically when push / pop / call / ret instructions executed. The value of Reg#SS Reg#SS usually won't be changed, unless user allowed for a extremely large stack dynamically allocated.

User can write Reg#SP Reg#SP and Reg#BP Reg#BP directly to change the stack frame. User can write Reg#SS Reg#SS directly to change the stack segment base. If Reg#SS Reg#SS changed and not restored before returning from function, the behaviour of other function frame may be not correct.

2.1.3.12.3.6 Flags, Test: Reg#FLAGS Reg#FLAGS , Reg#TESTS Reg#TESTS

After any instruction, Reg#FLAGS Reg#FLAGS will be set according to execution result.

:TEST cond, jmp :TEST cond, jmp instruction will set Reg#TESTS Reg#TESTS according to cond cond , and check whether :AND Reg#TESTS, Reg#FLAGS :AND Reg#TESTS, Reg#FLAGS If cond cond is true, jump to dst dst .

There are some literal for cond cond .

  • Test#g Test#g
  • Test#ng Test#ng
  • Test#l Test#l
  • Test#nl Test#nl
  • Test#e Test#e
  • Test#o Test#o
  • Test#no Test#no

Or any literal 16 bits value is also acceptable.

The meaning of flags bits in Reg#FLAGS Reg#FLAGS is as following:

0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                                                              40
Default       |C|1|P|0|A|0|Z|S|T|I|D|O|IOP|N|0|R|V|A|V*V*I|                   |
              |F| |F| |F| |F|F|F|F|F|F|L  |T| |F|M|C|F|P|D|                   |
            =>                                                                |
              | Exception code                                                |
* VF <- VIF; VP <- VIP
0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                                                              40
Default       |C|1|P|0|A|0|Z|S|T|I|D|O|IOP|N|0|R|V|A|V*V*I|                   |
              |F| |F| |F| |F|F|F|F|F|F|L  |T| |F|M|C|F|P|D|                   |
            =>                                                                |
              | Exception code                                                |
* VF <- VIF; VP <- VIP
  • CF: Carry Flag
  • PF: Parity Flag
  • AF: Auxiliary Carry Flag
  • ZF: Zero Flag
  • SF: Sign Flag
  • TF: Trap Flag

Exception code are passed to exception interrupt handler when exception raised.

Write operation on them will not have any effect.

2.1.3.12.4  Pointer Specification [Pointer]

A pointer in this virtual machine is a 64-bit unsigned integer that stored in the capture section of function unit.

The pointer uses 46 bits to address and 6 bits to identify the type of pointer, rest 12 bits are reserved for future use. Address can be divided to two part: Pointer Base Address (PBA) and Segment.

0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                              30              38              40
Default       | Pointer Base Address                                          |
            =>  PBA(c.)       | Segment   | Type      |                       |
* VF <- VIF; VP <- VIP
0x
              00              08              10              18              20
                                                                              40
              |0 0 0 0 0 0 0 0|0 0 1 1 1 1 1 1|1 1 1 1 2 2 2 2|2 2 2 2 2 2 3 3|
            => 3 3 3 3 3 3 3 3|4 4 4 4 4 4 4 4|4 4 5 5 5 5 5 5|5 5 5 5 6 6 6 6|
Decimal       |0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 8 9 0 1|
            => 2 3 4 5 6 7 8 9|0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|
--------------------------------------------------------------------------------
              00                              10                              20
                                              30              38              40
Default       | Pointer Base Address                                          |
            =>  PBA(c.)       | Segment   | Type      |                       |
* VF <- VIF; VP <- VIP

Type field defines the type of pointer, following are all defined pointer types:

  • Heap Pointer: type code 0x00 0x00 , point to heap allocated memory block.
  • Stack Pointer: type code 0x01 0x01 , point to global data stack location.
  • Function Pointer: type code 0x02 0x02 , point to function unit entry point.
  • Text Pointer: type code 0x03 0x03 , point to text vector.
  • Data Pointer: type code 0x04 0x04 , point to data section in function unit.
  • Constant Pointer: type code 0x05 0x05 , point to constant section in function unit.
2.1.3.12.5 Interrupt and Exception Handling

Interrupt handled by interrupt dispatch table. The first 128 entries of function unit vector is reserved for interrupt handling. When interrupt invoked, virtual machine will do following steps:

  1. Store current execution status by pushing all registers into global data stack
  2. Invoke interrupt handler function unit from interrupt dispatch table
  3. After interrupt handler function unit return, restore previous execution status by popping all registers from global data stack

Exception handled by invoke exception handler function unit. The default exception handler function unit is at index 0 0 in function unit vector. It is a special interrupt handler.

The interrupt handler must be provided by program, if no interrupt handler provided, virtual machine will write interrupt dispatch table entry to point to default exception handler. Which is a function unit that display exception trace information and halt the virtual machine.

Exception handle process do like described in instruction raise raise .

2.1.3.12.6 Model

The execution model of virtual machine have following steps:

  1. Load function unit into function unit vector
  2. Initialize global data stack
  3. Initialize execution stack
  4. Initialize register records
  5. Start execution from main function

When function call invoked, virtual machine will do following steps:

  1. Push current function frame pointer into execution stack
  2. Create new function frame in global data stack
  3. push local variables into global data stack
  4. start execution from called function

When function return invoked, virtual machine will do following steps:

  1. Move return value into Reg#A Reg#A and Reg#R Reg#R , if the return value larger than 2 register can represent, move the returning value into pre-allocated space in global data stack, and move the pointer into Reg#A Reg#A .
  2. Pop current function frame from execution stack.
  3. Clean up current function frame in global data stack.
  4. Resume execution from previous function frame.

When snapshot exception invoked, virtual machine will do following steps:

  1. Duplicate current global data stack segment, execution stack segment, and register records.
2.1.3.12.7  Call Convention [Call]

When a function about to be called, the caller must do following steps:

  1. Reverse return value space allocation in global data stack
  2. Push function arguments into global data stack, left most argument pushed last
  3. Move the return value address into Reg#A Reg#A
  4. Invoke call instruction with function

When a function called, the callee must do following steps:

  1. Create new function frame in global data stack, store previous stack base pointer and stack top pointer
  2. Store return value address from Reg#A Reg#A into function frame
  3. Push local variables into global data stack

When a function about to return, the callee must do following steps:

  1. Move return value accordingly, if the signature of function return value by register, move return value into Reg#A Reg#A and Reg#R Reg#R Else move return value into pre-allocated space in global data stack
  2. Restore previous stack base pointer and stack top pointer from function frame
  3. Invoke ret instruction

When a function returned, the caller must do following steps:

  1. Clean up function arguments from global data stack
  2. Resume execution from previous function frame
2.1.3.12.8  Instruction Specification [Instruction]

All Instruction adopted in the virtual machine are 32-bits length-fixed.

The instruction have four type of addressing method:

  • None addressing: no parameter is accepted
  • Register addressing: parameter is a register
  • Immediate addressing: parameter is a literal value
  • Memory addressing: parameter is a memory address

From all above addressing methods, the instruction can be divided into following categories:

  • Zero operand instruction: no parameter

  • Register operand instruction: only register parameter

  • Immediate operand instruction: only literal parameter

  • Memory operand instruction: only memory address parameter

  • Register-Register operand instruction: two register parameters

  • Register-Immediate operand instruction: one register parameter, one literal parameter

  • Immediate-Register operand instruction: one literal parameter, one register parameter

  • Register-Memory operand instruction: one register parameter, one memory address parameter

  • Memory-Register operand instruction: one memory address parameter, one register parameter

  • Memory-Memory operand instruction: two memory address parameters

  • Immediate-Immediate operand instruction: two literal parameters

  • Memory-Immediate operand instruction: one memory address parameter, one literal parameter

  • Register-Register-Register operand instruction: three register parameters

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
              20          1a          18      10            09    06          00
Default       | register  | register  | register  |         | typ | operator  |
              | register  | register  |                     | typ | operator  |
              | register  |                                 | typ | operator  |
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |
Register      | register  | flags                           | typ | operator  |
Immediate     | literal                       | flags       | typ | operator  |
RR            | register  | register  | flags               | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |
IR            | register  | literal                     |   | typ | operator  |
II            | literal       | literal       | flags       | typ | operator  |
RRR           | register  | register  | register  | flags   | typ | operator  |
RRI           | register  | register  | literal       |flags| typ | operator  |
RIR           | register  | register  | literal       |flags| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction pattern, distinguish by instruction type
* RRI and RIR are two variant of same instruction pattern, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
              20          1a          18      10            09    06          00
Default       | register  | register  | register  |         | typ | operator  |
              | register  | register  |                     | typ | operator  |
              | register  |                                 | typ | operator  |
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |
Register      | register  | flags                           | typ | operator  |
Immediate     | literal                       | flags       | typ | operator  |
RR            | register  | register  | flags               | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |
IR            | register  | literal                     |   | typ | operator  |
II            | literal       | literal       | flags       | typ | operator  |
RRR           | register  | register  | register  | flags   | typ | operator  |
RRI           | register  | register  | literal       |flags| typ | operator  |
RIR           | register  | register  | literal       |flags| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction pattern, distinguish by instruction type
* RRI and RIR are two variant of same instruction pattern, distinguish by instruction type
2.1.3.12.8.1  Instruction Set [IS]

The instruction set includes following instructions:

  • Interrupt and Exception Handling

    • Int Int : interrupt invoke instruction

      • I, :int idx :int idx : invoke interrupt with index idx idx
      • R, :int reg :int reg : invoke interrupt with address stored in register reg reg
  • Snapshot Exception Handling

    • Snap Snap : snapshot exception invoke instruction

      • :snap :snap : invoke snapshot exception
    • Raise Raise : raise exception instruction

      • I, :raise code :raise code : raise exception with code code code
  • Data Management

    • Mov Mov : move data instruction

      • RR, :mov s dst, shl d(src) :mov s dst, shl d(src) : move data from src src to dst dst , shift left by shl shl bits, padding with 0 or 1 by + + or - - .
      • RI, RI, :mov offset dst, val :mov offset dst, val : move immediate value val val to dst dst , offset can be low16 low16 , high16 high16 , low16h low16h , high16h high16h for low 16 or high 16 bits in totally low 32 bits of dst dst or low 16 or high 16 bits in totally high 32 bits of dst dst . E.g., :mov l Reg#1, 0xffff :mov l Reg#1, 0xffff assigns low 16 bits of Reg#1 Reg#1 to 0xffff 0xffff , :mov l Reg#1, 0x8fff :mov l Reg#1, 0x8fff assigns low 16 bits of Reg#1 Reg#1 to 0x8fff 0x8fff , with other version of :mov :mov
      • RR, :mov offset dst, ptr[src] :mov offset dst, ptr[src] : deference memory address src src and move data to dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for target data offset E.g., mov ah, ptr[rbx] mov ah, ptr[rbx] in x86_64 assembly can be represented as :mov 1 ah, bytes ptr[Reg#1] :mov 1 ah, bytes ptr[Reg#1] , meanwhile mov eax, ptr[rbx] mov eax, ptr[rbx] can be represented as :mov 0 eax, dword ptr[Reg#1] :mov 0 eax, dword ptr[Reg#1]
      • RR, :mov ptr[dst], offset src :mov ptr[dst], offset src : move data from src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for source data offset E.g., mov ptr[rbx], ah mov ptr[rbx], ah in x86_64 assembly can be represented as :mov 1[Reg#1], 1 ah :mov 1[Reg#1], 1 ah , meanwhile mov ptr[rbx], eax mov ptr[rbx], eax can be represented as :mov 4[Reg#1], 0 eax :mov 4[Reg#1], 0 eax
      • RR, :mov ptr [dst], [src] :mov ptr [dst], [src] : move data from memory address src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size
      • RRI, :mov dst, ptr[base + offset] :mov dst, ptr[base + offset] : move data from memory address calculated by base base register plus immediate offset offset to dst dst
      • RIR, :mov ptr[base + offset], src :mov ptr[base + offset], src : move data from src src to memory address calculated by base base register plus immediate offset offset
    • LSD LSD : load / save data instruction

      • I, :lsd op idx :lsd op idx : load or save data between global data stack and register Reg#A Reg#A with index idx idx

      op op can be one of following:

      • load load : load data from global data stack to Reg#A Reg#A
      • save save : save data from Reg#A Reg#A to global data stack
      • loadr loadr : load data from global data stack to Reg#R Reg#R
      • saver saver : save data from Reg#R Reg#R to global data stack
      • loadc loadc : load pre-defined data to Reg#A Reg#A
  • Arithmetic Computation

    • OpI OpI : arithmetic integer computation

      • RR, :opi op dst, src :opi op dst, src : perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RI, :opi op dst, val :opi op dst, val : perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • IR, :opi op val, src :opi op val, src : perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op dst, ptr[src] :opi op dst, ptr[src] : perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op ptr[dst], src :opi op ptr[dst], src : perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op prt[dst], [src] :opi op prt[dst], [src] : perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      op op can be one of following:

      • add add : addition
      • sub sub : subtraction
      • mul mul : multiplication
      • div div : division
    • OpU OpU : arithmetic unsigned integer computation

      • RR, :opi op dst, src :opi op dst, src : perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RI, :opi op dst, val :opi op dst, val : perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • IR, :opi op val, src :opi op val, src : perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op dst, ptr[src] :opi op dst, ptr[src] : perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op ptr[dst], src :opi op ptr[dst], src : perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R
      • RR, :opi op prt[dst], [src] :opi op prt[dst], [src] : perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      op op can be one of following:

      • add add : addition
      • sub sub : subtraction
      • mul mul : multiplication
      • div div : division
    • OpF OpF : arithmetic floating-point computation

      • RR, :opf op dst, src :opf op dst, src : perform arithmetic operation op op on floating-point dst dst and src src , store result into Reg#A Reg#A
      • RR, :opf op dst, ptr[src] :opf op dst, ptr[src] : perform arithmetic operation op op on floating-point dst dst and memory address src src , store result into Reg#A Reg#A
      • RR, :opf op ptr[dst], src :opf op ptr[dst], src : perform arithmetic operation op op on floating-point memory address dst dst and floating-point src src , store result into Reg#A Reg#A
      • RR, :opf op ptr[dst], [src] :opf op ptr[dst], [src] : perform arithmetic operation op op on floating-point memory address dst dst and floating-point memory address src src , store result into Reg#A Reg#A
      • RR, :opf fmod dst, src :opf fmod dst, src : perform floating-point modulus operation on dst dst and src src , store result into Reg#A Reg#A
      • RR, :opf fmod ptr [dst], [src] :opf fmod ptr [dst], [src] : perform floating-point modulus operation on memory address dst dst and memory address src src , store result into Reg#A Reg#A

      op op can be one of following:

      • fadd fadd : floating-point addition
      • fsub fsub : floating-point subtraction
      • fmul fmul : floating-point multiplication
      • fdiv fdiv : floating-point division
    • OpB OpB : arithmetic bitwise computation

      • RR, :opb op dst, src :opb op dst, src : perform arithmetic operation op op on bitwise dst dst and src src , store result into Reg#A Reg#A
      • RI, :opb op dst, val :opb op dst, val : perform arithmetic operation op op on bitwise dst dst and immediate value val val , store result into Reg#A Reg#A
      • RR, :opb op dst, ptr[src] :opb op dst, ptr[src] : perform arithmetic operation op op on bitwise dst dst and memory address src src , store result into Reg#A Reg#A
      • RR, :opb op ptr[dst], src :opb op ptr[dst], src : perform arithmetic operation op op on bitwise memory address dst dst and bitwise src src , store result into Reg#A Reg#A
      • RR, :opb op ptr[dst], [src] :opb op ptr[dst], [src] : perform arithmetic operation op op on bitwise memory address dst dst and bitwise memory address src src , store result into Reg#A Reg#A
      • RI, :opb op ptr[dst], val :opb op ptr[dst], val : perform arithmetic operation op op on bitwise memory address dst dst and immediate value val val , store result into Reg#A Reg#A

      op op can be one of following:

      • and and : bitwise AND
      • or or : bitwise OR
      • xor xor : bitwise XOR
      • not not : bitwise NOT
    • OpS OpS : arithmetic shift computation

      • RR, :ops op dst, src :ops op dst, src : perform shift operation op op on dst dst by src src bits, store result into Reg#A Reg#A
      • RR, :ops op dst, ptr[src] :ops op dst, ptr[src] : perform shift operation op op on dst dst by memory address src src bits, store result into Reg#A Reg#A
      • RR, :ops op ptr[dst], src :ops op ptr[dst], src : perform shift operation op op on memory address dst dst by src src bits, store result into Reg#A Reg#A
      • RR, :ops op ptr[dst], [src] :ops op ptr[dst], [src] : perform shift operation op op on memory address dst dst by memory address src src bits, store result into Reg#A Reg#A

      op op can be one of following:

      • shl shl : shift left
      • shr shr : shift right
      • sal sal : shift arithmetic left
      • sar sar : shift arithmetic right
      • rol rol : rotate left
      • ror ror : rotate right
      • rcl rcl : rotate through carry left
      • rcr rcr : rotate through carry right
  • Condition Test and Branch

    • Test Test : condition test instruction

      • II, :text cond, jmp :text cond, jmp : test condition cond cond , if true, jump to near address with offset jmp jmp
  • Control Flow Jump

    • Jmp Jmp : control flow jump instruction

      • I, :jmp:near offset :jmp:near offset : jump to near address dst dst with offset
      • R, :jmp:short dst :jmp:short dst : jump to short address stored in register dst dst
      • RI, :jmp:far segment : offset :jmp:far segment : offset : jump to far address offset offset in function unit segment segment
  • Loop Control

    • Loop Loop : loop control instruction

      • I, :loop offset :loop offset : decrement Reg#C Reg#C by one, if not zero, jump to near address with offset
  • Function Call and Return

    • Call Call : function call instruction

      • I, :call idx :call idx : call function with index idx idx in function unit vector
      • R, :call dst :call dst : call function with address stored in register dst dst
    • Ret Ret : function return instruction

      • :ret :ret : return from current function
    • IRet IRet : interrupt return instruction

      • :iret :iret : return from interrupt
    • RegF RegF : register new function instruction

      • RR, :regf skip, len :regf skip, len : register new function unit with code length len len in bytes, skip first skip skip bytes in global data stack
  • Stack Management

    • Stack Stack : stack management instruction

      • I, :stack alloc size :stack alloc size : allocate stack space with size size size bytes
      • I, :stack free size :stack free size : free stack space with size size size bytes
      • Zero, :stack clear :stack clear : clear current function stack frame
      • Zero, :stack dump :stack dump : dump current stack frame information for debugging
      • Zero, :stack create :stack create : create a new function stack frame
      • Zero, :stack destroy :stack destroy : destroy current function stack frame and return to previous function stack frame
      • I, :stack duplicate idx :stack duplicate idx : duplicate current stack segment, and update Reg#SS Reg#SS to point to new stack segment, store previous stack segment pointer into global data stack at index idx idx
      • I, :stack restore idx :stack restore idx : restore previous stack segment from global data stack at index idx idx , and update Reg#SS Reg#SS to point to restored stack segment
    • Push Push : push data onto stack instruction

      • R, :push src :push src : push data from register src src onto stack
      • I, :push val :push val : push immediate value val val onto stack
      • R, :push ptr[src] :push ptr[src] : push data from memory address src src onto stack
    • Pop Pop : pop data from stack instruction

      • R, :pop dst :pop dst : pop data from stack into register dst dst
      • R, :pop ptr[dst] :pop ptr[dst] : pop data from stack into memory address dst dst
2.1.3.12.8.2  Instruction: Int Int Int Int [/notes/d_flat/Turing/Instruction/Int]">[Int]

Int Int instruction is used to invoke interrupt.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

There are two variant of Int Int instruction:

  • Register variant:

    • Syntax: :int reg :int reg
    • Description: invoke interrupt with address stored in register reg reg
  • Immediate variant:

    • Syntax: :int idx :int idx
    • Description: invoke interrupt with index idx idx

No flags used.

Basically, int int instruction will save current execution status, and jump to interrupt handler function unit. All registers will be pushed into global data stack. The interrupt handler function unit will return to previous execution status by iret iret instruction.

Apart from 6 bits operator code and 3 bits type code, the rest 17 bits in R case, and rest 7 bits in I case must be 0. Otherwise, invalid instruction exception will be raised automatically.

The idx idx in immediate variant is a 15 bits unsigned integer. If idx idx larger than 0xff 0xff , invalid interrupt exception will be raised automatically.

There are some pre-defined interrupt index:

  • 0x00 0x00 : Exception Interrupt
  • 0x01 0x01 : System Call Interrupt
  • 0xff 0xff : Halt Interrupt

For case with register variant, the value in register must be aligned to 4 bytes. With unaligned address will raise invalid interrupt exception automatically. Treat the value in register as address in global data stack segment. And invoke interrupt handler function unit from that address.

2.1.3.12.8.3  Instruction: Snap Snap Snap Snap [/notes/d_flat/Turing/Instruction/Snap]">[Snap]

Snap Snap instruction is used to invoke snapshot exception.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          |                                           |f| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          |                                           |f| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Snap Snap instruction have no parameter.

  • Syntax: :snap flags :snap flags
  • Description: invoke snapshot exception

flags flags in syntax can be full full or light light , indicating full snapshot or light snapshot. If flags is omitted, full full is assumed. If flags is full full , the flag bit f f is set to 1 1 , otherwise 0 0 . If light snapshot invoked, only register records will be snapshotted.

Basically, snap snap instruction will duplicate current global data stack segment, execution stack segment, and register records. Then snapshot exception may be handled by exception handler function unit. Snapshot restore must be handled by user program explicitly.

Apart from 6 bits operator code and 3 bits type code, and 1 bits flag f f , the rest 22 bits must be 0. Otherwise, invalid instruction exception will be raised automatically.

2.1.3.12.8.4  Instruction: Raise Raise Raise Raise [/notes/d_flat/Turing/Instruction/Raise]">[Raise]

Raise Raise instruction is used to raise exception.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Raise Raise instruction have one immediate parameter.

  • Syntax: :raise code :raise code
  • Description: raise exception with code code code

No flag used.

Basically, raise raise instruction will invoke exception handler function unit. Exception code will be passed to exception handler function unit via Reg#FLAGS Reg#FLAGS . Exception handler function unit may return to previous execution status by iret iret instruction. Exception handler is a special interrupt handler.

There are some pre-defined exception code:

  • 0x00 0x00 : General Exception
  • 0x01 0x01 : Invalid Instruction Exception
  • 0x02 0x02 : Invalid Operand Exception
  • 0x03 0x03 : Invalid Variation Exception
  • 0x04 0x04 : Arithmetic Exception
  • 0x05 0x05 : Division by Zero Exception
  • 0x06 0x06 : Shift Count Exception
  • 0x07 0x07 : Arithmetic Overflow Exception
  • 0x08 0x08 : Invalid Interrupt Exception
  • 0x09 0x09 : Invalid Function Call Exception
  • 0x0A 0x0A : Invalid Parameter Exception
  • 0x0B 0x0B : Invalid Memory Access Exception
  • 0x0C 0x0C : Invalid Segment Access Exception
  • 0x0D 0x0D : Invalid Register Access Exception
  • 0x0E 0x0E : Stack Overflow Exception
  • 0x0F 0x0F : Stack Underflow Exception
  • 0x10 0x10 : Invalid Register Access Exception
  • 0x11 0x11 : Snapshot Restore Exception
  • 0x12 0x12 : Snapshot Exception
2.1.3.12.8.5  Instruction: Mov Mov Mov Mov [/notes/d_flat/Turing/Instruction/Mov]">[Mov]

Mov Mov instruction is used to move data between registers and memory.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |k| l |d| shl       |s| typ | operator  |
RR(1)         | register  | register  |             |o  |ss | typ | operator  |
RI            | register  | literal                     |o  | typ | operator  |
IR            | register  | literal                     |o  | typ | operator  |
RRI           | register  | register  | literal       | |ss | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |k| l |d| shl       |s| typ | operator  |
RR(1)         | register  | register  |             |o  |ss | typ | operator  |
RI            | register  | literal                     |o  | typ | operator  |
IR            | register  | literal                     |o  | typ | operator  |
RRI           | register  | register  | literal       | |ss | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Mov Mov instruction have following variants:

  • Register-Register variant:

    • Syntax: :mov s dst, shl d(src) :mov s dst, shl d(src)
    • Type code: 0, RR RR
    • Description: move data from src src to dst dst , shift left by shl shl bits, padding with 0 or 1 by + + or - - . + + is default if omitted.

      • dst dst : target register
      • src src : source register
      • shl shl : shift left bits, from 0 0 to 63 63 , optional, default is 0 0 if omitted
      • s s : + + or - - , padding with 0 0 or 1 1 , optional, if provided, manual padding
      • d d : shift direction and type, optional, default is left logical shift if omitted < < for shift left logical, > > for shift right logical, >> >> for shift right arithmetic, rol rol for roll left, ror ror for roll right, lp lp for manual padding shift
    • Flags:

      • s s : padding bit, 0 0 for + + , 1 1 for - - , available only when l is 11 11
      • d d : shift direction, 0 0 for left, 1 1 for right
      • l l : shift type code

        • 00 00 : logical shift
        • 01 01 : arithmetic shift
        • 10 10 : roll shift
        • 11 11 : manual padding shift
      • shl shl : 6 bits shift left bits
      • k k : short process flag, if read as 0 0 , treat as shift left logical with shl shl bits shift.
  • Register-Immediate variant:

    • Syntax: :mov offset dst, val :mov offset dst, val
    • Type code: 1 / 2, RI RI / RI RI , if literal have 16th bit set, use second type code, otherwise use first type code
    • Description: move immediate value val val to dst dst , offset can be low16 low16 , high16 high16 , low16h low16h , high16h high16h for low 16 or high 16 bits in totally low 32 bits of dst dst or low 16 or high 16 bits in totally high 32 bits of dst dst .

      • dst dst : target register
      • val val : immediate value
      • offset offset : target offset
    • Flags:

      • o o : 2 bits offset code

        • 00 00 : low 16
        • 01 01 : high 16
        • 10 10 : low 16h
        • 11 11 : high 16h
  • Register-Address(Register) variant:

    • Syntax: :mov ptr[dst], offset src :mov ptr[dst], offset src
    • Type code: 3, RR(1) RR(1)
    • Description: move data from src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for source data offset

      • dst dst : target memory address register
      • src src : source register
      • ptr ptr : data size
      • offset offset : source data offset
    • Flags:

      • ss ss : 2 bits source data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
      • o o : 2 bits source data offset

        • 00 00 : 0
        • 01 01 : 1
        • 10 10 : 2
        • 11 11 : 4
  • Address(Register)-Register variant:

    • Syntax: :mov offset dst, ptr[src] :mov offset dst, ptr[src]
    • Type code: 4, RR(2) RR(2)
    • Description: deference memory address src src and move data to dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size, offset can be 0 0 , 1 1 , 2 2 , 4 4 for target data offset

      • dst dst : target register
      • src src : source memory address register
      • ptr ptr : data size
      • offset offset : target data offset
    • Flags:

      • ss ss : 2 bits target data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
      • o o : 2 bits target data offset

        • 00 00 : 0
        • 01 01 : 1
        • 10 10 : 2
        • 11 11 : 4
  • Address(Register)-Address(Register) variant:

    • Syntax: :mov ptr [dst], [src] :mov ptr [dst], [src]
    • Type code: 5, RR RR
    • Description: move data from memory address src src to memory address dst dst , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • ptr ptr : data size
    • Flags:

      • ss ss : 2 bits data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Register-Address(Register + Immediate) variant:

    • Syntax: :mov dst, ptr[base + offset] :mov dst, ptr[base + offset]
    • Type code: 6, RRI RRI
    • Description: move data from memory address calculated by base base register plus immediate offset offset to dst dst

      • dst dst : target register
      • base base : base register for memory address calculation
      • offset offset : immediate offset for memory address calculation
    • Flags: none
  • Address(Register + Immediate)-Register variant:

    • Syntax: :mov ptr[base + offset], src :mov ptr[base + offset], src
    • Type code: 7, RIR RIR
    • Description: move data from src src to memory address calculated by base base register plus immediate offset offset

      • src src : source register
      • base base : base register for memory address calculation
      • offset offset : immediate offset for memory address calculation
    • Flags: none

All other combination of source and target operand are invalid for Mov Mov instruction.

Basically, mov mov instruction copies data from source operand to target operand directly.

The whole mov mov instruction family can be divided into three categories:

  • register to register move, including register-register variant, simply copy data between registers, with optional shift operation. Shift operation may be applied during data move. Depend on shift type and direction, data in source register will be shifted left or right by specified bits, and then moved to target register. For most case without shift operation, data in source register is copied directly to target register. dst = src; dst = src; For rest case with default shift operation, data int source register is shifted left logically by specified bits, and then moved to target register. dst = src << shl; dst = src << shl; For rest case with specified shift operation, data in source register is shifted depend on shift type and direction, and then moved to target register. By register to register move, the user can simulate 32 bits general-purposed register, like in Risc-V or x86_64 architecture.
  • immediate to register move, including register-immediate variant I and II, move immediate value to target register directly. The immediate value have 15 bits stored in instruction, the 16th bit distinguished by type code. For register-immediate variant I, 16th bit is 1 1 , for register-immediate variant II, 16th bit is 0 0 . The immediate value can be assigned to corresponding double-word in target register depend on offset parameter. Since the flags have 2 bits offset code, all four double-words in target register can be assigned separately.
  • register to memory, memory to register and memory to memory move, including RR, RR, RR, RR, RRI, RIR variants, Addressing using register, or register plus immediate offset. Basically read value in register and or add immediate to the value, addressing global data stack using the value as memory address. Data size and data offset must be specified by flags. For data size:

    • 0 means qword (8 bytes)
    • 1 means bytes (1 byte)
    • 2 means word (2 bytes)
    • 3 means dword (4 bytes)

    For data offset:

    • 0 means offset 0
    • 1 means offset 1
    • 2 means offset 2
    • 3 means offset 4

    RR variant with Addressing(Register)-Addressing(Register) must have o with 0 For RR(A(R)R) or RRI variant, Read ss bytes data from source memory address and write ss bytes data to target register with offset o. For RR(RA(R)) or RIR variant, Read ss bytes data from source register with offset o and write ss bytes data to target memory address. For RR(A(R)A(R)) variant, Read ss bytes data from source memory address and write ss bytes data to target memory address.

2.1.3.12.8.6  Instruction: LSD LSD LSD LSD [/notes/d_flat/Turing/Instruction/LSD]">[LSD]

LSD LSD instruction is used to load or save data between global data stack and register Reg#A Reg#A or Reg#R Reg#R .

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The LSD LSD instruction have one immediate parameter.

  • Syntax: :lsd op idx :lsd op idx
  • Description: load or save data between global data stack and register Reg#A Reg#A with index idx idx

    • op op : operation type, can be one of following:

      • load load : load data from global data stack to Reg#A Reg#A
      • save save : save data from Reg#A Reg#A to global data stack
      • loadr loadr : load data from global data stack to Reg#R Reg#R
      • saver saver : save data from Reg#R Reg#R to global data stack
      • loadc loadc : load pre-defined data to Reg#A Reg#A
  • Flags:

    • typ typ : 3 bits operation type code

      • 000 000 : load
      • 001 001 : save
      • 010 010 : load into Reg#R Reg#R
      • 011 011 : save from Reg#R Reg#R
      • 100 100 : load constant

For the case of load load operation, data is loaded from global data stack with index idx idx into target register. For the case of save save operation, data is saved from register Reg#A Reg#A into global data stack with idx idx For the case of loadr loadr operation, data is loaded from global data stack with index idx idx into target register Reg#R Reg#R . For the case of saver saver operation, data is saved from register Reg#R Reg#R into global data stack with idx idx . For the case of loadc loadc operation, pre-defined data with index idx idx is loaded into target register Reg#A Reg#A . Index idx idx can be:

  • 0 0 : unsigned 64-bit integer 0 0
  • 1 1 : unsigned 64-bit integer maximum value
  • 2 2 : unsigned 64-bit integer minimum value
  • 3 3 : signed 64-bit integer 0 0
  • 4 4 : signed 64-bit integer maximum value
  • 5 5 : signed 64-bit integer minimum value
  • 6 6 : IEEE 754 double-precision floating-point 0.0 0.0
  • 7 7 : IEEE 754 double-precision floating-point maximum value
  • 8 8 : IEEE 754 double-precision floating-point minimum value
  • 9 9 : IEEE 754 double-precision floating-point Not-a-Number (NaN)
  • 10 10 : IEEE 754 double-precision floating-point positive infinity
  • 11 11 : IEEE 754 double-precision floating-point negative infinity
  • 12 12 : IEEE 754 single-precision floating-point 0.0 0.0
  • 13 13 : IEEE 754 single-precision floating-point maximum value
  • 14 14 : IEEE 754 single-precision floating-point minimum value
  • 15 15 : IEEE 754 single-precision floating-point Not-a-Number (NaN)
  • 16 16 : IEEE 754 single-precision floating-point positive infinity
  • 17 17 : IEEE 754 single-precision floating-point negative infinity
  • 18 18 : boolean true true
  • 19 19 : boolean false false
  • 20 20 : character '\0' '\0'
  • 21 21 : character maximum value
  • 22 22 : character minimum value
  • 23 23 : null pointer

Basically, lsd lsd instruction provides a simple way to load or save data between global data stack and register Reg#A Reg#A or Reg#R Reg#R .

2.1.3.12.8.7  Instruction: OpI OpI OpI OpI [/notes/d_flat/Turing/Instruction/OpI]">[OpI]

OpI OpI instruction is used to perform arithmetic integer computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpI OpI instruction have following variants:

  • Register-Register variant:

    • Syntax: :opi op dst, src :opi op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Immediate variant:

    • Syntax: :opi op dst, val :opi op dst, val
    • Type code: 1, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Immediate-Register variant:

    • Syntax: :opi op val, src :opi op val, src
    • Type code: 2, IR IR , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • src src : source register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Address(Register) variant:

    • Syntax: :opi op dst, ptr[src] :opi op dst, ptr[src]
    • Type code: 3, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Register variant:

    • Syntax: :opi op ptr[dst], src :opi op ptr[dst], src
    • Type code: 4, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits target data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Address(Register) variant:

    • Syntax: :opi op prt[dst], [src] :opi op prt[dst], [src]
    • Type code: 5, RR RR
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword

OpI OpI support following arithmetic operations:

  • add add : addition
  • sub sub : subtraction
  • mul mul : multiplication
  • div div : division

After OpI OpI instruction executed, original Reg#A Reg#A and Reg#R Reg#R values are overwritten.

Basically, OpI OpI instruction performs arithmetic operation on integer data. And store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R . For addition operation, dst dst treated as addend, src src treated as addor, For subtraction operation, dst dst treated as minuend, src src treated as subtrahend, For multiplication operation, dst dst treated as multiplicand, src src treated as multiplier, For division operation, dst dst treated as dividend, src src treated as divisor.

2.1.3.12.8.8  Instruction: OpU OpU OpU OpU [/notes/d_flat/Turing/Instruction/OpU]">[OpU]

OpU OpU instruction is used to perform arithmetic integer computation, treat as unsigned integer.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 | op| typ | operator  |
RR(1)         | register  | register  |             |ss | op| typ | operator  |
RI            | register  | literal                     | op| typ | operator  |
IR            | register  | literal                     | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpU OpU instruction have following variants:

  • Register-Register variant:

    • Syntax: :opu op dst, src :opu op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on integer dst dst and src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Immediate variant:

    • Syntax: :opu op dst, val :opu op dst, val
    • Type code: 1, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer dst dst and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • dst dst : target register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Immediate-Register variant:

    • Syntax: :opu op val, src :opu op val, src
    • Type code: 2, IR IR , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on integer src src and immediate value val val , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R

      • src src : source register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Address(Register) variant:

    • Syntax: :opu op dst, ptr[src] :opu op dst, ptr[src]
    • Type code: 3, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer dst dst and memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Register variant:

    • Syntax: :opu op ptr[dst], src :opu op ptr[dst], src
    • Type code: 4, RR(1) RR(1)
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits target data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword
  • Address(Register)-Address(Register) variant:

    • Syntax: :opu op prt[dst], [src] :opu op prt[dst], [src]
    • Type code: 5, RR RR
    • Description: perform arithmetic operation op op on integer memory address dst dst and integer memory address src src , store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), bytes bytes ( 1 1 ), word word ( 2 2 ), dword dword ( 4 4 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits data size code

        • 00 00 : qword
        • 01 01 : bytes
        • 10 10 : word
        • 11 11 : dword

OpI OpI support following arithmetic operations:

  • add add : addition
  • sub sub : subtraction
  • mul mul : multiplication
  • div div : division

After OpU OpU instruction executed, original Reg#A Reg#A and Reg#R Reg#R values are overwritten.

Basically, OpU OpU instruction performs arithmetic operation on integer data. And store result into Reg#A Reg#A , carry or reminder into Reg#R Reg#R . For addition operation, dst dst treated as addend, src src treated as addor, For subtraction operation, dst dst treated as minuend, src src treated as subtrahend, For multiplication operation, dst dst treated as multiplicand, src src treated as multiplier, For division operation, dst dst treated as dividend, src src treated as divisor.

2.1.3.12.8.9  Instruction: OpF OpF OpF OpF [/notes/d_flat/Turing/Instruction/OpF]">[OpF]

OpF OpF instruction is used to perform arithmetic floating-point computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |             |ss | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |             |ss | op| typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpF OpF instruction have following variants:

  • Register-Register variant:

    • Syntax: :opf op dst, src :opf op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on floating-point dst dst and src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Register-Address(Register) variant:

    • Syntax: :opf op dst, ptr[src] :opf op dst, ptr[src]
    • Type code: 1, RR(1) RR(1)
    • Description: perform arithmetic operation op op on floating-point dst dst and memory address src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), dword dword ( 1 1 ), float128 float128 ( 2 2 ) for data size

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits source data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Address(Register)-Register variant:

    • Syntax: :opf op ptr[dst], src :opf op ptr[dst], src
    • Type code: 2, RR(1) RR(1)
    • Description: perform arithmetic operation op op on floating-point memory address dst dst and floating-point src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), dword dword ( 1 1 ), float128 float128 ( 2 2 ) for data size

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits target data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Address(Register)-Address(Register) variant:

    • Syntax: :opf op ptr[dst], [src] :opf op ptr[dst], [src]
    • Type code: 3, RR RR
    • Description: perform arithmetic operation op op on floating-point memory address dst dst and floating-point memory address src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R , ptr ptr can be qword qword ( 0 0 ), dword dword ( 1 1 ), float128 float128 ( 2 2 ) for data size

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
      • ss ss : 2 bits data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Register-Register Modulus variant:

    • Syntax: :opf fmod dst, src :opf fmod dst, src
    • Type code: 4, RR RR
    • Description: perform floating-point modulus operation on dst dst and src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R

      • dst dst : target register
      • src src : source register
    • Flags:

      • ss ss : 2 bits source data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128
  • Address(Register)-Address(Register) Modulus variant:

    • Syntax: :opf fmod ptr [dst], [src] :opf fmod ptr [dst], [src]
    • Type code: 5, RR RR
    • Description: perform floating-point modulus operation on memory address dst dst and memory address src src , store result into Reg#A Reg#A , reminder or high 64 bits of float 128 into Reg#R Reg#R

      • dst dst : target memory address register
      • src src : source memory address register
    • Flags:

      • ss ss : 2 bits data size code

        • 00 00 : double-precision
        • 01 01 : single-precision
        • 10 10 : float 128

If ss ss is 10 10 , float 128 operation is performed, and Reg#R Reg#R store high 64 bits of result. If ss ss is 10 10 , type code must be 3 or 5, only when both operands are memory address.

After OpF OpF instruction executed, original Reg#A Reg#A and Reg#R Reg#R values are overwritten.

2.1.3.12.8.10  Instruction: OpB OpB OpB OpB [/notes/d_flat/Turing/Instruction/OpB]">[OpB]

OpB OpB instruction is used to perform bitwise computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 |op | typ | operator  |
RI            | register  | literal                     |op | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                 |op | typ | operator  |
RI            | register  | literal                     |op | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpB OpB instruction have following variants:

  • Register-Register variant:

    • Syntax: :opb op dst, src :opb op dst, src
    • Type code: 0, RR RR
    • Description: perform arithmetic operation op op on bitwise dst dst and src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Immediate variant:

    • Syntax: :opb op dst, val :opb op dst, val
    • Type code: 1, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on bitwise dst dst and immediate value val val , store result into Reg#A Reg#A

      • dst dst : target register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Address(Register)-Register variant:

    • Syntax: :opb op ptr[dst], src :opb op ptr[dst], src
    • Type code: 2, RR RR
    • Description: perform arithmetic operation op op on bitwise memory address dst dst and bitwise src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Register-Address(Register) variant:

    • Syntax: :opb op dst, ptr[src] :opb op dst, ptr[src]
    • Type code: 3, RR RR
    • Description: perform arithmetic operation op op on bitwise dst dst and bitwise memory address src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Address(Register)-Address(Register) variant:

    • Syntax: :opb op ptr[dst], [src] :opb op ptr[dst], [src]
    • Type code: 4, RR RR
    • Description: perform arithmetic operation op op on bitwise memory address dst dst and bitwise memory address src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code
  • Address(Register)-Immediate variant:

    • Syntax: :opb op ptr[dst], val :opb op ptr[dst], val
    • Type code: 5, RI RI , val val can be at most 15 bits integer, or 14 bits signed integer
    • Description: perform arithmetic operation op op on bitwise memory address dst dst and immediate value val val , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • val val : immediate value
      • op op : arithmetic operation
    • Flags:

      • op op : 2 bits operation code

OpB OpB support following bitwise operations:

  • and and : bitwise AND
  • or or : bitwise OR
  • xor xor : bitwise XOR
  • not not : bitwise NOT

For the case op op is not not , operation result of performance to dst dst will be stored into Reg#A Reg#A , and other will be written to Reg#R Reg#R . For all other operations, result will be stored into Reg#A Reg#A , and Reg#R Reg#R is not modified.

2.1.3.12.8.11  Instruction: OpS OpS OpS OpS [/notes/d_flat/Turing/Instruction/OpS]">[OpS]

OpS OpS instruction is used to perform shift computation.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |               | opt | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |               | opt | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The OpS OpS instruction have following variants:

  • Register-Register variant:

    • Syntax: :ops op dst, src :ops op dst, src
    • Type code: 0, RR RR
    • Description: perform shift operation op op on dst dst by bits in src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
  • Register-Address(Register) variant:

    • Syntax: :ops op dst, ptr[src] :ops op dst, ptr[src]
    • Type code: 1, RR RR
    • Description: perform shift operation op op on dst dst by bits in memory address src src , store result into Reg#A Reg#A

      • dst dst : target register
      • src src : source memory address register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
  • Address(Register)-Register variant:

    • Syntax: :ops op ptr[dst], src :ops op ptr[dst], src
    • Type code: 2, RR RR
    • Description: perform shift operation op op on memory address dst dst by bits in src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
  • Address(Register)-Address(Register) variant:

    • Syntax: :ops op ptr[dst], [src] :ops op ptr[dst], [src]
    • Type code: 3, RR RR
    • Description: perform shift operation op op on memory address dst dst by bits in memory address src src , store result into Reg#A Reg#A

      • dst dst : target memory address register
      • src src : source memory address register
      • op op : shift operation
    • Flags:

      • opt opt : 3 bits operation code

        • 000 000 : logical left shift
        • 001 001 : logical right shift
        • 010 010 : arithmetic left shift
        • 011 011 : arithmetic right shift
        • 100 100 : rotate left
        • 101 101 : rotate right
        • 110 110 : rotate through carry left
        • 111 111 : rotate through carry right
2.1.3.12.8.12  Instruction: Test Test Test Test [/notes/d_flat/Turing/Instruction/Test]">[Test]

Test Test instruction is used to test condition and jump to target address if condition is met.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
II            | literal       | literal       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
II            | literal       | literal       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use

The Test Test instruction have one immediate parameter.

  • Syntax: :test cond, addr :test cond, addr
  • Description: test condition cond cond , if met, jump to target address addr addr

    • cond cond : condition to be tested
    • addr addr : target address to jump to if condition is met
  • Flags: none

cond cond are integer indeed, can be written as following to prevent confusion:

  • Test#e Test#e , 0, equal, zero flag is set
  • Test#g Test#g , 1, greater, not equal and sign flag equals overflow flag
  • Test#ng Test#ng , 2, not greater, equal or sign flag not equals overflow flag
  • Test#l Test#l , 3, less, sign flag not equals overflow flag
  • Test#nl Test#nl , 4, not less, sign flag equals overflow flag
  • Test#o Test#o , 5, overflow, overflow flag is set
  • Test#no Test#no , 6, not overflow, overflow flag is not set
  • Test#c Test#c , 7, carry, carry flag is set
  • Test#nc Test#nc , 8, not carry, carry flag is not set
  • Test#z Test#z , 9, zero, zero flag is set
  • Test#nz Test#nz , 10, not zero, not zero flag is set
  • Test#s Test#s , 11, sign, sign flag is set
2.1.3.12.8.13  Instruction: Jmp Jmp Jmp Jmp [/notes/d_flat/Turing/Instruction/Jmp]">[Jmp]

Jmp Jmp instruction is used to jump to target address unconditionally.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |
RI            | register  | literal                     |   | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use

The Jmp Jmp instruction have following variants:

  • Register variant:

    • Syntax: :jmp short dst :jmp short dst
    • Type code: 0, R
    • Description: jump to target address in register dst dst

      • dst dst : target register
    • Flags: none
  • Immediate variant:

    • Syntax: :jmp near offset :jmp near offset
    • Type code: 1, I
    • Description: jump to address offset with offset offset from function entry point.

      • offset offset : target address offset
    • Flags: none
  • Register-Immediate variant:

    • Syntax: :jmp far dst, offset :jmp far dst, offset
    • Type code: 2, RI
    • Description: jump to function with index dst dst in function unit vector, plus address offset offset offset

      • dst dst : target register
      • offset offset : immediate offset
    • Flags: none

Basically, jmp jmp instruction provides a way to jump to target address unconditionally. Used for control flow transfer in program execution.

2.1.3.12.8.14  Instruction: Loop Loop Loop Loop [/notes/d_flat/Turing/Instruction/Loop]">[Loop]

Loop Loop instruction is used to perform loop operation with counter register.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Immediate     | literal                       |             | typ | operator  |

* literal: literal constant value, either in 16 bits or 8 bits integer.

The Loop Loop instruction have no parameters.

  • Syntax: :loop addr :loop addr
  • Description: decrement counter register Reg#C Reg#C , if not zero, jump to target address addr addr

    • addr addr : target address to jump to if counter not zero
  • Flags: none

Loop like x86 loop loop instruction, decrement counter register Reg#C Reg#C by 1.

2.1.3.12.8.15  Instruction: Call Call Call Call [/notes/d_flat/Turing/Instruction/Call]">[Call]

Call Call instruction is used to call function at target address.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Register      | register  |                                 | typ | operator  |
Immediate     | literal                       |             | typ | operator  |

* registers: 6 bits register code
* literal: literal constant value, either in 16 bits or 8 bits integer.
* flags: reserved space, for instruction extension use
* RI and IR are two variant of same instruction, distinguish by instruction type

The Call Call instruction have following variants:

  • Register variant:

    • Syntax: :call dst :call dst
    • Type code: 0, R
    • Description: call function at target address in register dst dst

      • dst dst : target register
    • Flags: none
  • Immediate variant:

    • Syntax: :call idx :call idx
    • Type code: 1, I
    • Description: call function with index idx idx in function unit vector

      • idx idx : target index
    • Flags: none

Basically, call call instruction provides a way to call function. All necessary function call setup must be done before call call instruction executed. Top of execution stack always trace both pointer to function and the execution status of the function. Thus return address is stored automatically when call call instruction executed. Call instruction pushes new execution context onto execution stack.

2.1.3.12.8.16  Instruction: Ret Ret Ret Ret [/notes/d_flat/Turing/Instruction/Ret]">[Ret]

Ret Ret instruction is used to return from function call.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use

The Ret Ret instruction have no parameters.

  • Syntax: :ret :ret
  • Description: return from current function call to caller function
  • Flags: none

Basically, ret ret instruction provides a way to return from function call. When ret ret instruction executed, current execution context is popped from execution stack, Reg#PC Reg#PC and Reg#EP Reg#EP restored to caller function's context.

2.1.3.12.8.17  Instruction: IRet IRet IRet IRet [/notes/d_flat/Turing/Instruction/IRet]">[IRet]

IRet IRet instruction is used to return from interruption handler.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
Zero          | flags                                       | typ | operator  |

* flags: reserved space, for instruction extension use

The IRet IRet instruction have no parameters.

  • Syntax: :iret :iret
  • Description: return from current interruption handler to interrupted context
  • Flags: none

Basically, iret iret instruction provides a way to return from interruption handler. When iret iret instruction executed, register information stored when interruption occurs is restored. Execution stack pop and continues execution of previous executed function.

2.1.3.12.8.18  Instruction: RegF RegF RegF RegF [/notes/d_flat/Turing/Instruction/RegF]">[RegF]

RegF RegF instruction is used to register a new function.

0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                     | typ | operator  |

* registers: 6 bits register code
0x
              20              18              10              08              00
              |3 3 2 2 2 2 2 2|2 2 2 2 1 1 1 1|1 1 1 1 1 1 0 0|0 0 0 0 0 0 0 0|
Decimal       |1 0 9 8 7 6 5 4|3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0|
--------------------------------------------------------------------------------
RR            | register  | register  |                     | typ | operator  |

* registers: 6 bits register code

The RegF RegF instruction have two parameters.

  • Syntax: :regf skip, len :regf skip, len
  • Description: register a new function with code length len len , skip skip skip bytes after registration

    • skip skip : number of bytes to skip after registration
    • len len : length of function code in bytes
  • Flags: none

The RegF RegF instruction creates a new function unit and assign the text with given data. If skip skip and len len is not aligned to instruction size a invalided instruction exception will be raised.

2.1.3.12.8.19  Instruction: Stack Stack Stack Stack [/notes/d_flat/Turing/Instruction/Stack]">[Stack]

Stack Stack instruction is used to manipulate global data stack.